xx_perf_watcher for monitoring waits during short time intervals
A while ago I worked on a performance troubleshooting case where frequent short time degradation of IO performance on NetApp storage was suspected to be the root cause. The problem was to get some proof as looking at the averages of IO service time was not alarming enough. I decided to write a tool that could be used to monitor wait times for any DB wait event in short intervals, e.g. so I could get measures for db file sequential/scattered read performance each second. I thought for a while and figured out the requirements:
- It should be lightweight and easy to set up;
- the results have to be visible real-time;
- it should be possible to spool results to file;
- it should allow monitoring any wait event;
- it should allow to define interval length between measurement points.
In the end I came up with the solution – a pipelined function, with interval size and name of wait event as parameters, that “queries” the performance metrics using a simple select statement, making it possible to spool the results into a file and seeing results real-time. You can take a look at it in the video below. Continue reading if you’re interested in seeing the source code and reading some explanations on key implementation tricks that made this possible.
OK, here are some details on how the tool is written:
- It’s a pipelined function that returns SYSTEM.PA_VARCHAR2_240_TBL_TYPE type result, I chose this type as it was available on 10g and 11g and this way I didn’t need to create my own data types. Function is pipelined therefore it allows constructing a SELECT statement to return data from function that passes the results via PIPE ROW statement. Choosing this construction allows to use query syntax to get the results, note setting arraysize to 1 is really important to see the real-time results, SQL*Plus is fetching rows in arraysize sets and not displaying them until the whole set is received.
SQL> set lines 2000 pages 0 trimspool on arraysize 1
SQL> select * from table(cast(xx_perf_watcher.monitor_waits('db file sequential read',2) as system.PA_VARCHAR2_240_TBL_TYPE));
| WAITEVENT | TIMESTAMP | WAITS | TIME US | MS/EVENT |
|-----------------------------------------------------------------------------------------|
| db file sequential read | 16.06.2011 11:27:34.13 | 95030085 | 412378973106 | 4.34 |
| db file sequential read | 16.06.2011 11:27:34.32 | 21 | 82193 | 3.91 |
| db file sequential read | 16.06.2011 11:27:34.49 | 28 | 50584 | 1.81 |
| db file sequential read | 16.06.2011 11:27:34.67 | 13 | 35691 | 2.75 |
| db file sequential read | 16.06.2011 11:27:34.88 | 34 | 63947 | 1.88 |
| db file sequential read | 16.06.2011 11:27:35.06 | 6 | 21738 | 3.62 |
| db file sequential read | 16.06.2011 11:27:35.24 | 9 | 7127 | .79 |
| db file sequential read | 16.06.2011 11:27:35.42 | 8 | 19406 | 2.43 |
| db file sequential read | 16.06.2011 11:27:35.59 | 2 | 1653 | .83 |
| db file sequential read | 16.06.2011 11:27:35.77 | 47 | 120500 | 2.56 |
| db file sequential read | 16.06.2011 11:27:35.96 | 43 | 127720 | 2.97 |
| db file sequential read | 16.06.2011 11:27:36.15 | 45 | 109652 | 2.44 |
- function XX_PERF_WATCHER.MONITOR_WAITS requires 3 arguments:
- name of the wait event to be monitored, e.g. “db file sequential read”
- length of the interval in seconds, e.g. “2″ (you can use 0 for “as quick as you can” sampling, fractional numbers are also OK, e.g. 0.5)
- optional number of iterations to perform, default is 100000000
- The way of getting the results inside the package is based on the very cool idea by Tanel Poder, that he uses in his publicly available scripts and tools library e.g. in latchprofx.sql, where very quick sampling of X$ table is done by enforced NESTED LOOP join operation
- In this case I’m sampling X$KSLEI to get the performance measures
- I’ve also created a function XX_PERF_WATCHER.SLEEP on top of DBMS_LOCK.SLEEP, so that it could be included in the same NESTED LOOP sampling technique to set the required sampling interval.
- The package works on 10g and 11g (I’ve tested it on 10.2.0.4, 10.2.0.5, 11.1.0.7, 11.2.0.1 and 11.2.0.2), but as the definition of X$KSLEI differs on 10g and 11g, I had to use compiler directives $IF, $ELSIF, $ELSE, $END to separate 10g code from 11g code.
- As the function samples X$ table, I installed the package using SYS, unfortunately I didn’t manage to figure out how to get similar functionality without need of creating any DB objects, so if any of you know some ways, please let me know!
create or replace package xx_perf_watcher authid definer is
function sleep(p_sec number) return number;
function monitor_waits(p_event in varchar2,
p_int_sec number,
p_iter number default 100000000)
return system.PA_VARCHAR2_240_TBL_TYPE
pipelined;
end xx_perf_watcher;
/
create or replace package body xx_perf_watcher is
function sleep(p_sec number) return number is
begin
dbms_lock.sleep(p_sec);
return 1;
end;
function monitor_waits(p_event in varchar2,
p_int_sec number,
p_iter number default 100000000)
return system.PA_VARCHAR2_240_TBL_TYPE
pipelined as
l_tstamp varchar2(22) := to_char(systimestamp,
'DD.MM.YYYY HH24:MI:SS.FF2');
l_total_waits number := 0;
l_time_waited_micro number := 0;
l_avg_wait_time_ms number := 0;
c_total_waits number := 0;
c_time_waited_micro number := 0;
c_avg_wait_time_ms number := 0;
v_dummy number;
v_text varchar2(500);
$IF DBMS_DB_VERSION.VER_LE_9_2 $THEN
null;
$ELSIF DBMS_DB_VERSION.VER_LE_10_2 $THEN
cursor c1 is
select /*+ ORDERED USE_NL(b.s) NO_TRANSFORM_DISTINCT_AGG*/
b.dummy, b.total_waits, b.time_waited_micro
from (select /*+ NO_MERGE */
1
from dual
connect by level <= p_iter) a,
(select xx_perf_watcher.sleep(p_int_sec) dummy,
KSLEDNAM event,
s.ksleswts total_waits,
s.kslestim time_waited_micro
from SYS.X$KSLEI S, SYS.X$KSLED D
where S.KSLESWTS > 0
and D.KSLEDNAM = p_event
AND D.INST_ID = USERENV('INSTANCE')
AND S.INDX = D.INDX) b;
$ELSIF DBMS_DB_VERSION.VER_LE_11_2 $THEN
cursor c1 is
select /*+ ORDERED USE_NL(b.s) NO_TRANSFORM_DISTINCT_AGG*/
b.dummy, b.total_waits, b.time_waited_micro
from (select /*+ NO_MERGE */
1
from dual
connect by level <= p_iter) a,
(select xx_perf_watcher.sleep(p_int_sec) dummy,
KSLEDNAM event,
s.ksleswts_un + s.ksleswts_fg + s.ksleswts_bg total_waits,
s.kslestim_un + s.kslestim_fg + s.kslestim_bg time_waited_micro
from SYS.X$KSLEI S, SYS.X$KSLED D
where (S.KSLESWTS_FG > 0 OR S.KSLESWTS_BG > 0 OR
S.KSLESWTS_UN > 0)
and D.KSLEDNAM = p_event
AND D.INST_ID = USERENV('INSTANCE')
AND S.INDX = D.INDX) b;
$ELSE
null;
$END
begin
$IF DBMS_DB_VERSION.VER_LE_9_2 $THEN
raise_application_error(-20001, 'DB version not supported');
$ELSIF DBMS_DB_VERSION.VER_LE_10_2 $THEN
null;
$ELSIF DBMS_DB_VERSION.VER_LE_11_2 $THEN
null;
$ELSE
raise_application_error(-20001, 'DB version not supported');
$END
v_text := '|' || rpad(' WAITEVENT', length(p_event) + 2, ' ') || '|' ||
rpad(' TIMESTAMP', 24, ' ') || '|' || rpad(' WAITS', 12, ' ') || '|' ||
rpad(' TIME US', 14, ' ') || '|' ||
rpad(' MS/EVENT', 10, ' ') || '|';
pipe row(v_text);
v_text := '|' ||
rpad('-', length(p_event) + 2 + 24 + 12 + 14 + 10 + 4, '-') || '|';
pipe row(v_text);
OPEN c1;
LOOP
FETCH c1
INTO v_dummy, c_total_waits, c_time_waited_micro;
EXIT WHEN c1%NOTFOUND;
v_text := '| ' || rpad(p_event, length(p_event), ' ') || ' | ' ||
to_char(systimestamp, 'DD.MM.YYYY HH24:MI:SS.FF2') || ' | ' ||
lpad(to_char(c_total_waits - l_total_waits), 10, ' ') ||
' | ' ||
lpad(to_char(c_time_waited_micro - l_time_waited_micro),
12,
' ') || ' | ';
if c_total_waits != l_total_waits then
v_text := v_text ||
lpad(to_char(round((c_time_waited_micro -
l_time_waited_micro) /
(c_total_waits - l_total_waits) / 1000,
2)),
8,
' ') || ' |';
else
v_text := v_text || ' |';
end if;
l_total_waits := c_total_waits;
l_time_waited_micro := c_time_waited_micro;
pipe row(v_text);
END LOOP;
CLOSE c1;
return;
end;
end xx_perf_watcher;
/
create public synonym xx_perf_watcher for sys.xx_perf_watcher;
grant execute on xx_perf_watcher to public;
P.S. use syntax like this after you install the package to get it running:
SQL> set lines 2000 pages 0 trimspool on arraysize 1
SQL> select * from table(cast(xx_perf_watcher.monitor_waits('db file sequential read',2) as system.PA_VARCHAR2_240_TBL_TYPE));
Recent Comments