To simplify the case let's assume there are the following 3 tables
A(a_id), B(b_id,val_b), C(a_id,b_id,val_c)
I need to find all a_id that have certain value pairs from B and C. Example find all a_id which have records (val_b='1' and val_c='2' and B.b_id=C.b_id) AND (val_b='3' and val_c='4' and B.b_id=C.b_id) AND ...
select A.a_id
from A
where (A.a_id in
(select C.a_id
from B, C
where B.b_id=C.b_id and B.val_b='1' and C.val_c='2') and
A.a_id in
(select C.a_id
from B, C
where B.b_id=C.b_id and B.val_b='3' and C.val_c='4') and
A.a_id in
(select C.a_id
from B, C
where B.b_id=C.b_id and B.val_b='5' and C.val_c='6'));
What I noticed is that by adding a few more (val_b,val_c) additional pairs postgres takes substantial time to perform the query. To note thatindexes are present for ids, val_b and val_c.
Is there a way to optimize the query? Tried explicit inner joins but didn't help to improve the performance.
Thanks in advance
More info:
- postgres version 8.2.4
- With only one pair criteria runs in 77.621ms
- With 2 pair criteria - 151.588 ms
With 3 pairs of criteria - 49483.979 ms <-- performance sparks crazy
Note that separate subquery itself runs under ~62ms.
Update:
Both separate INTERSECT query version suggested below by Vladimir Baranov and the version with having clause that uses bool_or aggregation function by Clodoaldo Neto performed much better. Thank you !
However, the question remains why postgres 8.2 has such a performance spark with original query starting with 3 pair criteria?
BTW, I noticed the same spark also with Vladimir Baranov`s first suggestion to rewrite the query with clean joins. See below:
SELECT A.a_id
FROM
A
INNER JOIN (SELECT C.a_id FROM B INNER JOIN C ON B.b_id=C.b_id WHERE B.val_b='1' and C.val_c='2') Set1 ON Set1.a_id = A.a_id
INNER JOIN (SELECT C.a_id FROM B INNER JOIN C ON B.b_id=C.b_id WHERE B.val_b='3' and C.val_c='4') Set2 ON Set2.a_id = A.a_id
INNER JOIN (SELECT C.a_id FROM B INNER JOIN C ON B.b_id=C.b_id WHERE B.val_b='5' and C.val_c='6') Set3 ON Set3.a_id = A.a_id
;
With 3 sets th query runs quite fast, but as soon as one adds another 3-4 sets the query performance degrades to ~30-40 secs.