5

I have hierarchical data (right) in table in following manner which creates Hierarchy as shown in left. Tables are kept in oracle 11g.

TREE Hierarchy          Tree Table  
--------------          Element Parent
                        ------  ------
P0                      P0  
    P1                  P1      P0
        P11             P2      P0
            C111        P11     P1
            C112        P12     P1
        P12             P21     P2
            C121        P22     P2
            C122        C111    P11
    P2                  C112    P11
        P21             C121    P12
            C211        C122    P12
            C212        C211    P21
        P22             C212    P21
            C221        C221    P22
            C222        C222    P22

My data table has values as follows. It contains values for all leaf nodes.
Data Table

Element Value  
C111    3  
C112    3  
C121    3  
C122    3  
C211    3  
C212    3  
C221    3  
C222    3  
P11     6  

I need to generate insert statement, preferably single insert statement which will insert rows in data table based on sum of values of the children. Please note we need to calculate sum for only those parents whose value is not present in data table.

Data Table (Expected After Insert)

Element Value
C111    3
C112    3
C121    3
C122    3
C211    3
C212    3
C221    3
C222    3
P11     6

-- Rows to insert
P12     6
P21     6
P22     6
P1      12
P2      12
P0      24
BigBoss
  • 413
  • 8
  • 23

2 Answers2

7

If all leaf nodes are at the same height (here lvl=4), you can write a simple CONNECT BY query with a ROLLUP:

SQL> SELECT lvl0,
  2         regexp_substr(path, '[^/]+', 1, 2) lvl1,
  3         regexp_substr(path, '[^/]+', 1, 3) lvl2,
  4         SUM(VALUE) sum_value
  5    FROM (SELECT sys_connect_by_path(t.element, '/') path,
  6                 connect_by_root(t.element) lvl0,
  7                 t.element, d.VALUE, LEVEL lvl
  8             FROM tree t
  9             LEFT JOIN DATA d ON d.element = t.element
 10            START WITH t.PARENT IS NULL
 11           CONNECT BY t.PARENT = PRIOR t.element)
 12   WHERE VALUE IS NOT NULL
 13     AND lvl = 4
 14   GROUP BY lvl0, ROLLUP(regexp_substr(path, '[^/]+', 1, 2),
 15                         regexp_substr(path, '[^/]+', 1, 3));

LVL0 LVL1  LVL2   SUM_VALUE
---- ----- ----- ----------
P0   P1    P11            6
P0   P1    P12            6
P0   P1                  12
P0   P2    P21            6
P0   P2    P22            6
P0   P2                  12
P0                       24

The insert would look like:

INSERT INTO data (element, value) 
(SELECT coalesce(lvl2, lvl1, lvl0), sum_value
   FROM <query> d_out
  WHERE NOT EXISTS (SELECT NULL
                      FROM data d_in
                     WHERE d_in.element = coalesce(lvl2, lvl1, lvl0)));

If the height of the leaf nodes is unknown/unbounded this gets more hairy. The above approach wouldn't work since ROLLUP needs to know exactly how many columns are to be considered.

In that case, you could use the tree structure in a self-join :

SQL> WITH HIERARCHY AS (
  2     SELECT t.element, path, VALUE
  3       FROM (SELECT sys_connect_by_path(t.element, '/') path,
  4                    connect_by_isleaf is_leaf, ELEMENT
  5                FROM tree t
  6               START WITH t.PARENT IS NULL
  7              CONNECT BY t.PARENT = PRIOR t.element) t
  8       LEFT JOIN DATA d ON d.element = t.element
  9                       AND t.is_leaf = 1
 10  )
 11  SELECT h.element, SUM(elements.value)
 12    FROM HIERARCHY h
 13    JOIN HIERARCHY elements ON elements.path LIKE h.path||'/%'
 14   WHERE h.VALUE IS NULL
 15   GROUP BY h.element
 16   ORDER BY 1;

ELEMENT SUM(ELEMENTS.VALUE)
------- -------------------
P0                       24
P1                       12
P11                       6
P12                       6
P2                       12
P21                       6
P22                       6
Vincent Malgrat
  • 66,725
  • 9
  • 119
  • 171
  • Ah. The self join option. That's what I was looking to write but couldn't work out how. You make it seems so simple. – Mike Meyers Nov 22 '11 at 13:16
  • @Vincent: I took almost 40 mins to understand query. But now I have incorporated in my base login and can see it working. – BigBoss Nov 23 '11 at 12:53
  • @Vincent: I am using second query suggested by you as level is not fixed, also it varies for certain tree nodes. – BigBoss Nov 23 '11 at 13:01
4

Here is another option using the SQL MODEL clause. I've taken some hints from what Vincent has done in his answer (use of regexp_subsr) to simplify my code.

The first part, within the WITH clause just rejigs the data and extracts out the hierarchy at each level.

The model clause, at the end of the query, brings the data up from the lowest levels. This will need additional columns added if there are more than four levels but should work no matter at what level the values are held.

I'm not entirely sure that this will work in all circumstances since I'm not that experienced with the MODEL clause but it does at least seem to work in this case.

with my_hierarchy_data as (
select 
    element,
    value, 
    path, 
    parent,
    lvl0,
    regexp_substr(path, '[^/]+', 1, 2) as lvl1,
    regexp_substr(path, '[^/]+', 1, 3) as lvl2,
    regexp_substr(path, '[^/]+', 1, 4) as lvl3
from ( 
  select 
    element,
    value, 
    parent,
    sys_connect_by_path(element, '/') as path, 
    connect_by_root element as lvl0
  from 
    tree
    left outer join data using (element)
  start with parent is null
  connect by prior element = parent
  order siblings by element
  )
)
select 
    element,
    value, 
    path, 
    parent,
    new_value,
    lvl0, 
    lvl1, 
    lvl2, 
    lvl3
from my_hierarchy_data
model
return all rows
partition by (lvl0)
dimension by (lvl1, lvl2, lvl3)
measures(element, parent, value, value as new_value, path)
rules sequential order (
    new_value[lvl1, lvl2, null] = sum(value)[cv(lvl1), cv(lvl2), lvl3 is not null],
    new_value[lvl1, null, null] = sum(new_value)[cv(lvl1), lvl2 is not null, null],
    new_value[null, null, null] = sum(new_value)[lvl1 is not null, null, null]
)

The insert statement you can use is

INSERT INTO data (elelment, value)
select element, newvalue
from <the_query>
where value is null;
Mike Meyers
  • 2,885
  • 1
  • 20
  • 26
  • Well to be honest my first thought was to use the model clause but I didn't manage to work it out =) – Vincent Malgrat Nov 22 '11 at 13:41
  • @Mike: Your answer is good too. I chose Vincent's answer for the simplicity it provided. I still have not evaluated these answers for performance. But will do that if necessary. – BigBoss Nov 23 '11 at 13:02
  • @BigBoss I must admit I was rather impressed by the brevity of Vincent's queries. – Mike Meyers Nov 23 '11 at 14:44