Performance issues in SQL query with a hierarchical relationship

Question

I have an Oracle table that represents parent-child relationships, and I want to improve the performance of a query that searches the hierarchy for an ancestor record. I'm testing with the small data set here, though the real table is much larger:

id   name    parent_id   tagged
==   ====    =========   ======
1    One     null        null  
2    Two     1           1
3    Three   2           null
4    Four    3           null
5    Five    null        null
6    Six     5           1
7    Seven   6           null
8    Eight   null        null
9    Nine    8           null

parent_id refers back to id in this same table in a foreign key relationship.

I want to write a query that returns each leaf record (those records that have no descendants... id 4 and id 7 in this example) which has an ancestor record that has tagged = 1 (walking back through the parent_id relationship).

So, for the above source data, I want my query to return:

id   name    tagged_ancestor_id
==   ====    ==================  
4    Four    2
7    Seven   6

My current query to retrieve these records is:

select * from (
    select id, 
           name,
           connect_by_root id tagged_ancestor_id    
    from mytree 
    connect by prior id = parent_id
    start with tagged is not null
) m1            
where not exists (
    select * from mytree m2 where m2.parent_id = m1.id
)

This query works fine on this simple little example table, but its performance is terrible on my real table which has about 11,000,000 records. The query takes over a minute to run.

There are indexes on both fields in the connect by clause.
The "tagged" field in the start with clause also has an index on it, and there are about 1,500,000 records in my table with non-null values in this field.
The where clause doesn't seem to be the problem, because when modify it to return a specific name (also indexed) with where name = 'somename' instead of where not exists ..., the query still takes about the same amount of time.

So, what are some strategies I can use to try to make these types queries on this hierarchy run faster?

If you can modify your schema you can reconstruct it to store a modified preorder tree traversal, but you will need to add 2 additional columns (a left and right value). I have done this with success in mysql several years ago following a tutorial I can't presently locate. :( You should get some extra performance at the cost of space. — ozborn, Dec 21 '15 at 20:57
I do have control over the schema, so adding some columns would be ok if it is the best way to improve this performance.I guess I could just stash the one tagged ancestor in its own field and be done with it, but it seemed like this query could be made faster. — Chris Farmer, Dec 21 '15 at 21:06
Here's the appropriate wikipedia reference: https://en.wikipedia.org/wiki/Nested_set_model I'm not sure I understand how stash one tagged ancestor in its own field is going to help you (unless you store all of them?) but I am probably not understanding your question...? — ozborn, Dec 21 '15 at 21:15

score 1 · Answer 1 · edited May 23 '17 at 12:30

Here is what I would check first:

Make sure your table has a primary key.
Make sure the statistics on the table are current. Use DBMS_STATS.GATHER_TABLE_STATS to collect the statistics. See this URL: (for ORACLE version 11.1):

http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_stats.htm

Even if you have indexes on both fields individually, you still need
an index on the 2 fields combined; Create an index on the ID and PARENT_ID:

CREATE INDEX on TABLE_NAME(ID, PARENT_ID);

See this URL:

Optimizing Oracle CONNECT BY when used with WHERE clause

Make sure the underlying table does not have row chaining or other problems (E.G. corruption).

Make sure the table and all indexes are in the same tablespace.

Thanks for the suggestions. I'll check these out. – Chris Farmer Dec 21 '15 at 21:36 — Chris Farmer, Dec 21 '15 at 21:36

xQbert · Answer 2 · 2015-12-21T22:30:28.867

I'm not sure if this is any faster without the volume of data to test with... but something to consider. I guess I'm hoping by starting with only those that are tagged, and only those that are leafs we are dealing with a smaller volume to process which may result in a performance gain. but the overhead for the string manipulation seems hackish.

  with cte(id, name, parent_id, tagged) as (
SELECT 1, 'ONE', null, null from dual union all
SELECT 2, 'TWO', 1, 1 from dual union all
SELECT 3, 'THREE', 2, null from dual union all
SELECT 4, 'FOUR', 3, null from dual union all
select 5, 'FIVE', null, null from dual union all
select 6, 'SIX', 5, 1 from dual union all
select 7, 'SEVEN', 6, null from dual union all
select 8, 'EIGHT', null, null from dual union all
select 9, 'NINE', 8, null from dual),

Leafs(id, name) as (select id, Name
           from cte
where connect_by_isleaf = 1
Start with parent_Id is null
connect by nocycle prior id =parent_id),

Tagged as (SELECT id, name, SYS_CONNECT_BY_PATH(ID, '/') Path, substr(SYS_CONNECT_BY_PATH(ID, '/'),2,instr(SYS_CONNECT_BY_PATH(ID, '/'),'/',2)-2) as Leaf
from cte
where tagged=1
start with id in (select id from leafs)
connect by nocycle prior parent_id = id)

select l.*, T.ID as Tagged_ancestor from leafs L
inner join tagged t
 on l.id = t.leaf

In essence I created 3 cte's one for the data (Cte) one for the leafs(leafs) and one for the tagged records (tagged)

We traverse the hierarchy twice. Once to get all the leafs, once to get all the tagged. We then parse out the first leaf value from the tagged hierarchy and join it back to leafs to get the leafs related to tagged records.

As to if this is faster than what you're doing... Shrug I didn't want to spend the time testing since I don't have your indexes nor do I have your data volume

Performance issues in SQL query with a hierarchical relationship

2 Answers2