extremly disproportionate running time for query. Trying to understand RDBMS

Question

I am working with a customers database (db2) where our company only has read access. Hence I don't have any control over indices, schema and the like. I want to emphasize this point, because there still might be indices I just don't know where. So I have to optimize our query differently. Anyway, I'm more interested in understanding how these things work rather than in the workaround (although I'm interested in that too). I expected DBs to be fast for these kind of operations but apparently they are not. Or maybe I am running into a serious pitfall.

The scenario: I dissected our query and I stripped it down to the following which is incomprehensible to me. Let us call this query q1 which is of the form

select c.cid, c.bid, c.ryear, t.tyear, td.nr
 from mySchema.cTable c
 join mySchema.dTable d on d.cid = c.cid
 join mySchema.ipTable ip on (ip.did = d.did and ip.type = 'type_s')
 join mySchema.tTable t on t.xtime = ip.xtime
 join mySchema.tdTable td on c.tdid = 'type_'||td.oid
where
 c.ryear = 2009
 and d.rr = 'ugk'
 and d.stat = 'stat#1'
;

this query retrieves about 8000 entries and takes over a minute to execute.

Now the odd part: if I remove any single of the conditions in the where clause, the query takes less than 2 secs to execute. Just for completeness: In the case where I remove ryear the result set returns about 10000 entries. If I remove the d.stat = 'stat#1' the result set contains about 45000 entries and if I remove the d.rr = 'ugk' I actually get the exact same results as if I don't remove it (i.e. in this specific case this condition is redundant for the result set).

Can anyone explain me how/why this enormous time-difference happens if I have all 3 conditions? If I simply put 2 conditions and then grep on the cli for the third one I'm considerably faster. What the heck is the DB doing?

Note: I'm using DbVisualizer to run my queries. When I say execute in time x, I mean the execution time of the query, not the fetching time (see: http://www.dbvis.com/forum/thread.jspa?threadID=1536). Although the fetching time also increases drastically if I have all 3 statements (about 10 minutes!) to retrieve the exact same results as if I exclude the and d.rr = 'ugk' part. For the case where I only have 2 conditions, the fetch time is at most 10 secs (namely when retrieving 45000 entries). For the sake of completeness, if I have no where at all I get 130000 entries (exec: 1.8secs, fetch 28secs)

My question: What is going on? Can a single statement in the where-clause blow the dbs execution plan up so badly?

p.s.: I'm sorry the name of the tables/columns are not more expressive, but I had to obfuscate them a bit for obvious reasons. p.p.s.: I'd be glad to edit the title if you can find something more suited.

too many joins + with many data + non-optimal conditions = bad performance — Nikos M., May 29 '15 at 16:00
At minimum, the concatenation in `c.tdid = 'type_' || td.oid` probably isn't helping any (likely invalidates the use of an index on the join). How much this effects the behavior here isn't completely obvious... — Clockwork-Muse, May 30 '15 at 03:07
A couple of additional side notes: 1) Extreme running time may indicate index stats are out of date, a specific command can be run to update these if necessary (obviously, you can't run it yourself). 2) You can query the [information schema about indices](http://stackoverflow.com/a/19256135/812837), you almost certainly need this info to intelligently plan future queries. For that matter, DBVIS should tell you this information too (as part of the EXPLAIN plan). — Clockwork-Muse, Jun 01 '15 at 09:18

score 0 · Answer 1 · answered May 29 '15 at 19:00

0

A slow query is usually indicative of a full table scan. You generally everything to come through an index rather than a table scan. There may be tools in IBM Data Studio to help you identify the cause of the table scan.

I would suggest the following indexes:

SET SCHEMA mySchema;
CREATE INDEX cTable_ryear ON cTable(ryear) INCLUDE (cid, bid, tdid);
CREATE INDEX dTable_rr_stat_cid ON dTable(rr, stat, cid) INCLUDE (did);
CREATE INDEX ipTable_did_type ON ipTable(did, type) INCLUDE (xtime);
CREATE INDEX tTable_xtime ON tTable(xtime) INCLUDE (tyear);

There's likely a way to optimize access to tdTable as well.

answered May 29 '15 at 19:00

Leo

1,493
14
27

One over the first things I said is, I only have read access. Hence I can't create indices. – dingalapadum May 29 '15 at 19:13
@dingalapadum - at some point, you either have to accept the cost of running things as they are, or spend resources (db space). If this commonly run, your DBAs should be able to help you. – Clockwork-Muse May 30 '15 at 03:10

score -1 · Answer 2 · answered May 29 '15 at 19:23

-1

Can you tell us what indexes are in place (SYSIBM.SYSINDEXES)?

Perhaps you can split the query in 2 by using a WITH aka temporary table...

answered May 29 '15 at 19:23

Stavr00

3,219
1
16
28

extremly disproportionate running time for query. Trying to understand RDBMS

2 Answers2