I am working with a customers database (db2) where our company only has read access. Hence I don't have any control over indices, schema and the like. I want to emphasize this point, because there still might be indices I just don't know where. So I have to optimize our query differently. Anyway, I'm more interested in understanding how these things work rather than in the workaround (although I'm interested in that too). I expected DBs to be fast for these kind of operations but apparently they are not. Or maybe I am running into a serious pitfall.
The scenario:
I dissected our query and I stripped it down to the following which is incomprehensible to me. Let us call this query q1
which is of the form
select c.cid, c.bid, c.ryear, t.tyear, td.nr
from mySchema.cTable c
join mySchema.dTable d on d.cid = c.cid
join mySchema.ipTable ip on (ip.did = d.did and ip.type = 'type_s')
join mySchema.tTable t on t.xtime = ip.xtime
join mySchema.tdTable td on c.tdid = 'type_'||td.oid
where
c.ryear = 2009
and d.rr = 'ugk'
and d.stat = 'stat#1'
;
this query retrieves about 8000 entries and takes over a minute to execute.
Now the odd part: if I remove any single of the conditions in the where
clause, the query takes less than 2 secs to execute. Just for completeness: In the case where I remove ryear the result set returns about 10000 entries. If I remove the d.stat = 'stat#1'
the result set contains about 45000 entries and if I remove the d.rr = 'ugk'
I actually get the exact same results as if I don't remove it (i.e. in this specific case this condition is redundant for the result set).
Can anyone explain me how/why this enormous time-difference happens if I have all 3 conditions? If I simply put 2 conditions and then grep on the cli for the third one I'm considerably faster. What the heck is the DB doing?
Note: I'm using DbVisualizer to run my queries. When I say execute in time x, I mean the execution time of the query, not the fetching time (see: http://www.dbvis.com/forum/thread.jspa?threadID=1536). Although the fetching time also increases drastically if I have all 3 statements (about 10 minutes!) to retrieve the exact same results as if I exclude the and d.rr = 'ugk'
part. For the case where I only have 2 conditions, the fetch time is at most 10 secs (namely when retrieving 45000 entries). For the sake of completeness, if I have no where
at all I get 130000 entries (exec: 1.8secs, fetch 28secs)
My question: What is going on? Can a single statement in the where-clause blow the dbs execution plan up so badly?
p.s.: I'm sorry the name of the tables/columns are not more expressive, but I had to obfuscate them a bit for obvious reasons. p.p.s.: I'd be glad to edit the title if you can find something more suited.