I have an Oracle table that represents parent-child relationships, and I want to improve the performance of a query that searches the hierarchy for an ancestor record. I'm testing with the small data set here, though the real table is much larger:
id name parent_id tagged
== ==== ========= ======
1 One null null
2 Two 1 1
3 Three 2 null
4 Four 3 null
5 Five null null
6 Six 5 1
7 Seven 6 null
8 Eight null null
9 Nine 8 null
parent_id refers back to id in this same table in a foreign key relationship.
I want to write a query that returns each leaf record (those records that have no descendants... id 4 and id 7 in this example) which has an ancestor record that has tagged = 1
(walking back through the parent_id relationship).
So, for the above source data, I want my query to return:
id name tagged_ancestor_id
== ==== ==================
4 Four 2
7 Seven 6
My current query to retrieve these records is:
select * from (
select id,
name,
connect_by_root id tagged_ancestor_id
from mytree
connect by prior id = parent_id
start with tagged is not null
) m1
where not exists (
select * from mytree m2 where m2.parent_id = m1.id
)
This query works fine on this simple little example table, but its performance is terrible on my real table which has about 11,000,000 records. The query takes over a minute to run.
- There are indexes on both fields in the
connect by
clause. - The "tagged" field in the
start with
clause also has an index on it, and there are about 1,500,000 records in my table with non-null values in this field. - The
where
clause doesn't seem to be the problem, because when modify it to return a specific name (also indexed) withwhere name = 'somename'
instead ofwhere not exists ...
, the query still takes about the same amount of time.
So, what are some strategies I can use to try to make these types queries on this hierarchy run faster?