Home > Tools, Troubleshooting > xx_perf_watcher for monitoring waits during short time intervals

xx_perf_watcher for monitoring waits during short time intervals

A while ago I worked on a performance troubleshooting case where frequent short time degradation of IO performance on NetApp storage was suspected to be the root cause. The problem was to get some proof as looking at the averages of IO service time was not alarming enough. I decided to write a tool that could be used to monitor wait times for any DB wait event in short intervals, e.g. so I could get measures for db file sequential/scattered read performance each second. I thought for a while and figured out the requirements:

  • It should be lightweight and easy to set up;
  • the results have to be visible real-time;
  • it should be possible to spool results to file;
  • it should allow monitoring any wait event;
  • it should allow to define interval length between measurement points.

In the end I came up with the solution – a pipelined function, with interval size and name of wait event as parameters, that “queries” the performance metrics using a simple select statement, making it possible to spool the results into a file and seeing results real-time. You can take a look at it in the video below. Continue reading if you’re interested in seeing the source code and reading some explanations on key implementation tricks that made this possible.

OK, here are some details on how the tool is written:

  • It’s a pipelined function that returns SYSTEM.PA_VARCHAR2_240_TBL_TYPE type result, I chose this type as it was available on 10g and 11g and this way I didn’t need to create my own data types. Function is pipelined therefore it allows  constructing a SELECT statement to return data from function that passes the results via PIPE ROW statement. Choosing this construction allows to use query syntax to get the results, note setting arraysize to 1 is really important to see the real-time results, SQL*Plus is fetching rows in arraysize sets and not displaying them until the whole set is received.

SQL> set lines 2000 pages 0 trimspool on arraysize 1
SQL> select * from table(cast(xx_perf_watcher.monitor_waits('db file sequential read',2) as system.PA_VARCHAR2_240_TBL_TYPE));
| WAITEVENT               | TIMESTAMP              | WAITS      | TIME US      | MS/EVENT |
|-----------------------------------------------------------------------------------------|
| db file sequential read | 16.06.2011 11:27:34.13 |   95030085 | 412378973106 |     4.34 |
| db file sequential read | 16.06.2011 11:27:34.32 |         21 |        82193 |     3.91 |
| db file sequential read | 16.06.2011 11:27:34.49 |         28 |        50584 |     1.81 |
| db file sequential read | 16.06.2011 11:27:34.67 |         13 |        35691 |     2.75 |
| db file sequential read | 16.06.2011 11:27:34.88 |         34 |        63947 |     1.88 |
| db file sequential read | 16.06.2011 11:27:35.06 |          6 |        21738 |     3.62 |
| db file sequential read | 16.06.2011 11:27:35.24 |          9 |         7127 |      .79 |
| db file sequential read | 16.06.2011 11:27:35.42 |          8 |        19406 |     2.43 |
| db file sequential read | 16.06.2011 11:27:35.59 |          2 |         1653 |      .83 |
| db file sequential read | 16.06.2011 11:27:35.77 |         47 |       120500 |     2.56 |
| db file sequential read | 16.06.2011 11:27:35.96 |         43 |       127720 |     2.97 |
| db file sequential read | 16.06.2011 11:27:36.15 |         45 |       109652 |     2.44 |

  • function XX_PERF_WATCHER.MONITOR_WAITS requires 3 arguments:
    • name of the wait event to be monitored, e.g. “db file sequential read”
    • length of the interval in seconds, e.g. “2″ (you can use 0 for “as quick as you can” sampling, fractional numbers are also OK, e.g. 0.5)
    • optional number of iterations to  perform, default is 100000000
  • The way of getting the results inside the package is based on the very cool idea by Tanel Poder, that he uses in his publicly available scripts and tools library e.g. in latchprofx.sql, where very quick sampling of X$ table is done by enforced NESTED LOOP join operation
    • In this case I’m sampling X$KSLEI to get the performance measures
    • I’ve also created a function XX_PERF_WATCHER.SLEEP on top of DBMS_LOCK.SLEEP, so that it could be included in the same NESTED LOOP sampling technique to set the required sampling interval.
  • The package works on 10g and 11g (I’ve tested it on 10.2.0.4, 10.2.0.5, 11.1.0.7, 11.2.0.1 and 11.2.0.2), but as the definition of X$KSLEI differs on 10g and 11g, I had to use compiler directives $IF, $ELSIF, $ELSE, $END to separate 10g code from 11g code.
  • As the function samples X$ table, I installed the package using SYS, unfortunately I didn’t manage to figure out how to get similar functionality without need of creating any DB objects, so if any of you know some ways, please let me know!
And finally here is the full code of the pl/sql package (click on the “show source” link), just remember, try it out in the test environment if you decide to give it a go!

create or replace package xx_perf_watcher authid definer is

  function sleep(p_sec number) return number;
  function monitor_waits(p_event   in varchar2,
                         p_int_sec number,
                         p_iter    number default 100000000)
    return system.PA_VARCHAR2_240_TBL_TYPE
    pipelined;

end xx_perf_watcher;
/

create or replace package body xx_perf_watcher is
  function sleep(p_sec number) return number is
  begin
    dbms_lock.sleep(p_sec);
    return 1;
  end;

  function monitor_waits(p_event   in varchar2,
                         p_int_sec number,
                         p_iter    number default 100000000)
    return system.PA_VARCHAR2_240_TBL_TYPE
    pipelined as

    l_tstamp            varchar2(22) := to_char(systimestamp,
                                                'DD.MM.YYYY HH24:MI:SS.FF2');
    l_total_waits       number := 0;
    l_time_waited_micro number := 0;
    l_avg_wait_time_ms  number := 0;
    c_total_waits       number := 0;
    c_time_waited_micro number := 0;
    c_avg_wait_time_ms  number := 0;
    v_dummy             number;
    v_text              varchar2(500);

    $IF DBMS_DB_VERSION.VER_LE_9_2 $THEN
    null;
    $ELSIF DBMS_DB_VERSION.VER_LE_10_2 $THEN
    cursor c1 is
      select /*+ ORDERED USE_NL(b.s) NO_TRANSFORM_DISTINCT_AGG*/
       b.dummy, b.total_waits, b.time_waited_micro
        from (select /*+ NO_MERGE */
               1
                from dual
              connect by level <= p_iter) a,
             (select xx_perf_watcher.sleep(p_int_sec) dummy,
                     KSLEDNAM event,
                     s.ksleswts total_waits,
                     s.kslestim time_waited_micro
                from SYS.X$KSLEI S, SYS.X$KSLED D
               where S.KSLESWTS > 0
                 and D.KSLEDNAM = p_event
                 AND D.INST_ID = USERENV('INSTANCE')
                 AND S.INDX = D.INDX) b;
    $ELSIF DBMS_DB_VERSION.VER_LE_11_2 $THEN
    cursor c1 is
      select /*+ ORDERED USE_NL(b.s) NO_TRANSFORM_DISTINCT_AGG*/
       b.dummy, b.total_waits, b.time_waited_micro
        from (select /*+ NO_MERGE */
               1
                from dual
              connect by level <= p_iter) a,
             (select xx_perf_watcher.sleep(p_int_sec) dummy,
                     KSLEDNAM event,
                     s.ksleswts_un + s.ksleswts_fg + s.ksleswts_bg total_waits,
                     s.kslestim_un + s.kslestim_fg + s.kslestim_bg time_waited_micro
                from SYS.X$KSLEI S, SYS.X$KSLED D
               where (S.KSLESWTS_FG > 0 OR S.KSLESWTS_BG > 0 OR
                     S.KSLESWTS_UN > 0)
                 and D.KSLEDNAM = p_event
                 AND D.INST_ID = USERENV('INSTANCE')
                 AND S.INDX = D.INDX) b;
    $ELSE
    null;
    $END
  begin
    $IF DBMS_DB_VERSION.VER_LE_9_2 $THEN
    raise_application_error(-20001, 'DB version not supported');
    $ELSIF DBMS_DB_VERSION.VER_LE_10_2 $THEN
    null;
    $ELSIF DBMS_DB_VERSION.VER_LE_11_2 $THEN
    null;
    $ELSE
    raise_application_error(-20001, 'DB version not supported');
    $END

    v_text := '|' || rpad(' WAITEVENT', length(p_event) + 2, ' ') || '|' ||
              rpad(' TIMESTAMP', 24, ' ') || '|' || rpad(' WAITS', 12, ' ') || '|' ||
              rpad(' TIME US', 14, ' ') || '|' ||
              rpad(' MS/EVENT', 10, ' ') || '|';
    pipe row(v_text);
    v_text := '|' ||
              rpad('-', length(p_event) + 2 + 24 + 12 + 14 + 10 + 4, '-') || '|';
    pipe row(v_text);
    OPEN c1;
    LOOP
      FETCH c1
        INTO v_dummy, c_total_waits, c_time_waited_micro;
      EXIT WHEN c1%NOTFOUND;
      v_text := '| ' || rpad(p_event, length(p_event), ' ') || ' | ' ||
                to_char(systimestamp, 'DD.MM.YYYY HH24:MI:SS.FF2') || ' | ' ||
                lpad(to_char(c_total_waits - l_total_waits), 10, ' ') ||
                ' | ' ||
                lpad(to_char(c_time_waited_micro - l_time_waited_micro),
                     12,
                     ' ') || ' | ';
      if c_total_waits != l_total_waits then
        v_text := v_text ||
                  lpad(to_char(round((c_time_waited_micro -
                                     l_time_waited_micro) /
                                     (c_total_waits - l_total_waits) / 1000,
                                     2)),
                       8,
                       ' ') || ' |';
      else
        v_text := v_text || '         |';
      end if;
      l_total_waits       := c_total_waits;
      l_time_waited_micro := c_time_waited_micro;
      pipe row(v_text);
    END LOOP;
    CLOSE c1;
    return;
  end;
end xx_perf_watcher;
/

create public synonym xx_perf_watcher for sys.xx_perf_watcher;
grant execute on xx_perf_watcher to public;

P.S. use syntax like this after you install the package to get it running:

SQL> set lines 2000 pages 0 trimspool on arraysize 1
SQL> select * from table(cast(xx_perf_watcher.monitor_waits('db file sequential read',2) as system.PA_VARCHAR2_240_TBL_TYPE));

Advertisement
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.