how to identify toxic sqls in DB2

Question

We have an ask to identify the toxic queries that are fired in the database and kill them so that the application is not affected because of it. For instance, a long running query that is utilising more CPU should be killed. I have done some research and found that something called WLM is available in the DB2 enterprise version. Also, have discussed with few DBAs and understood below are some of the parameters can be monitored from the WLM ( Workload Manager ).

ESTIMATEDSQLCOST ,ACTIVITYTOTALTIME , SQLTEMPSPACE , UOWTOTALTIME

Whilst i continue to learn more on those, can someone throw some light or share the expertise on what are the parameters that can be used to identify such toxic queries that would impact other operations.

score 2 · Answer 1 · answered Sep 26 '19 at 10:50

The question is a bit general, but I can give some guidance on few scenarios that you might be interested in:

"Toxic" queries executing in the database right now.

For those you can use MON_GET_ACTIVITY. It gives you detailed metrics for all activities in the database. Bad SQL can be typically identified by: - long execution time (TOTAL_ACT_TIME) - high number of rows read (ROWS_READ) or even better, ratio of ROWS_READ to ROWS_RETURNED (e.g. we expect SELECT * to read a lot of rows, but it will also return that many) - high ratio of data reads (POOL_DATA_L_READS) to index reads (POOL_INDEX_L_READS), which typically indicates query could benefit from better indexing.

An example query could look like this:

db2 "select local_start_time, application_handle, total_act_time, rows_read, rows_returned, pool_index_l_reads, pool_data_l_reads, substr(stmt_text,1,100) as stmt_text from table(mon_get_activity(null, -2)) where member=coord_partition_num order by total_act_time desc"

LOCAL_START_TIME    APP_HANDLE TOTAL_ACT_TIME  ROWS_READ    ROWS_RETURNED POOL_I_L_READS POOL_D_L_READS STMT_TEX
------------------- ---------- -------------- ----------    ------------- -------------- --------------- -------
2019-09-26-10.31.57    3640333       66633923      78863           629729             32               0 SELECT 
2019-09-26-10.31.57    2329627       66627534     225535           629729             32               0 SELECT 
2019-09-26-10.31.57    1019395       66613118      95760           629729             18               0 SELECT 
2019-09-26-10.31.57    3640332       66608933      32607           302242              4               0 SELECT

(of course there are more interesting metrics available, I've just showed few) Once you identify the query, you can include EXECUTABLE_ID column and use EXPLAIN_FROM_SECTION to generate explain.

Queries that has been executed in the past and still are in package cache.

For those you can use MON_GET_PKG_CACHE_STMT and use similar counters as in 1. Note that this one contains cumulative metrics for all executions, so you might want to divide numbers by NUM_EXECUTIONS

Queries that might be executed in the future

To narrow down such queries you can use CREATE THRESHOLD and make Db2 collect diagnostics (COLLECT ACTIVITY DATA) or even access plans (WITH DETAILS SECTION) once specific query exceeds certain threshold (SQLROWSREAD, ACTIVITYTOTALRUNTIME etc). Moreover, you can automatically interrupt such queries (STOP EXECUTION).

Great detailed response. Let me start with the above inputs, thanks for the guidance. — Sudersan, Sep 27 '19 at 04:10

how to identify toxic sqls in DB2

1 Answers1

"Toxic" queries executing in the database right now.

Queries that has been executed in the past and still are in package cache.

Queries that might be executed in the future