3

I have a table with hierarchical data as follows.

create table tst as 
select 1 id, null parent_id from dual union all
select 2 id, 1 parent_id from dual union all
select 3 id, 1 parent_id from dual union all
select 4 id, 2 parent_id from dual union all
select 5 id, 3 parent_id from dual union all
select 6 id, 5 parent_id from dual union all
select 7 id, 6 parent_id from dual union all
select 8 id, 6 parent_id from dual;

It is trivial to traverse the hierarchy using the CONNECT BY statement.

The extract requirement I have is to ignore the simple (bamboo like) part of the tree, i.e. if a parent has only one child, both are joined and the ID's are concatenated (this rule is applied recursively).

So the expected result is

       ID  PARENT_ID
---------- ----------
         1            
         2,4        1 
         3,5,6      1 
         7          3,5,6 
         8          3,5,6 

UPDATE alternatively this is also correct answer (adding the concatenated node list and reusing the original IDS)

        ID  PARENT_ID NODE_LST 
---------- ---------- ---------
         1            1       
         4          1 2,4     
         6          1 3,5,6   
         7          6 7       
         8          6 8

This far I manage to count the child and to build the complete path to the root of the child counts and ID's...

with child_cnt as (
-- child count per parent
select parent_id, count(*) cnt 
from tst
where parent_id is not NULL
group by parent_id),
tst2 as (
select 
  ID,  child_cnt.cnt,
  tst.parent_id
from tst left outer join child_cnt on tst.parent_id = child_cnt.parent_id),
tst3 as (
SELECT id, parent_id,
  sys_connect_by_path(cnt,',') child_cnt_path,
  sys_connect_by_path(id,',') path
FROM tst2
  START WITH parent_id IS NULL
  CONNECT BY  parent_id  = PRIOR id
)
select * from tst3
;


        ID  PARENT_ID CHILD_CNT_PATH PATH       
---------- ---------- -------------- ------------
         1            ,              ,1           
         2          1 ,,2            ,1,2         
         4          2 ,,2,1          ,1,2,4       
         3          1 ,,2            ,1,3         
         5          3 ,,2,1          ,1,3,5       
         6          5 ,,2,1,1        ,1,3,5,6     
         7          6 ,,2,1,1,2      ,1,3,5,6,7   
         8          6 ,,2,1,1,2      ,1,3,5,6,8   

This would suggest that on the IDs 4 and 5 a skip of one level is to be done (one trailing child count 1) and on ID 6 a skip 2 level (two training ones in the count path).

But I think there should be a simpler approach to solve this.

Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53

2 Answers2

1

This query will get you to the alternative solution.

While there may be some further optimisations or bugs to be fixed, it works for your test case.

WITH nodes_to_dispose as (
    SELECT min(id) as id,
           parent_id
    FROM tst
    WHERE parent_id is not null
    GROUP BY parent_id
    HAVING count(*) = 1 )
-- This part returns merged bamboo nodes
SELECT nodes_to_dispose.id,
       connect_by_root tst.parent_id as parent_id,
       connect_by_root nodes_to_dispose.parent_id ||
               sys_connect_by_path(nodes_to_dispose.id, ',') as node_lst
FROM nodes_to_dispose, tst
WHERE nodes_to_dispose.parent_id = tst.id (+)
AND connect_by_isleaf = 1
START WITH nodes_to_dispose.parent_id not in (
    SELECT id
    FROM nodes_to_dispose )
CONNECT BY prior nodes_to_dispose.id = nodes_to_dispose.parent_id
UNION
-- This part returns all other nodes in their original form
SELECT id, parent_id, to_char(id) as node_lst
FROM tst
WHERE id not in (
    SELECT parent_id
    FROM nodes_to_dispose
    UNION
    SELECT id
    FROM nodes_to_dispose);
Peter M.
  • 713
  • 1
  • 5
  • 14
1

This isn't very elegant, but it should work. I'll edit if I can figure out a better way to do the final part. Good luck!

with
     d ( id, parent_id, degree ) as (
       select id, parent_id, count(parent_id) over (partition by parent_id)
       from   tst
     ),
     x ( old_id, new_id ) as (
       select id, ltrim(sys_connect_by_path(id, ','), ',')
       from   d
       where connect_by_isleaf = 1
       start with degree != 1
       connect by parent_id = prior id
       and        degree = 1
     )
select x1.new_id as id, x2.new_id as parent_id
from   x x1 
            inner join tst 
                 on tst.id        = regexp_substr(x1.new_id, '^[^,]+')
            left outer join x x2
                 on tst.parent_id = x2.old_id
;
  • Nice idea of connecting the nodes with degree one. I'll accept tomorrow after rethinking it:) – Marmite Bomber Jul 24 '16 at 19:05
  • I couldn't come up with a way to avoid the two joins at the end. It would be nice if I could keep track of the parent of the first node in a bamboo part without requiring a join (lookup in the original table); but everything I thought of in that regard would make it much more cumbersome to identify the bamboo parts in the first place. Good luck! –  Jul 25 '16 at 00:16