8

I've a table like this which contains links :

key_a    key_b
--------------
a        b        
b        c
g        h     
a        g       
c        a
f        g

not really tidy & infinite recursion ...

key_a = parent key_b = child

Require a query which will recompose and attribute a number for each hierarchical group (parent + direct children + indirect children) :

key_a    key_b    nb_group
--------------------------
a        b        1
a        g        1
b        c        1
**c        a**        1
f        g        2
g        h        2

**link responsible of infinite loop**

Because we have

A-B-C-A

-> Only want to show simply the link as shown.

Any idea ?

Thanks in advance

Visée Maxence
  • 249
  • 3
  • 16
  • update : issue with infinite recursion – Visée Maxence Jul 30 '13 at 14:48
  • What does "Infinite recursion" mean? That there's no theoretical limit on the hierarchy? Or that the hierarchy loops at some point such that there is literally no childless node on some branches? --Edit: Looks like the latter, so what do you want to happen when you reach a loop? – Dominic P Aug 01 '13 at 13:44
  • because a parent can be a child in some case – Visée Maxence Aug 01 '13 at 13:47
  • I don't understand what is `nb_group`. Is it the level of the parent? if so, in the 3rd line shouldn't it be 2 ? – asafrob Aug 01 '13 at 14:06
  • I would like to identify each hierarchical group (parents + direct children + indirect children) – Visée Maxence Aug 01 '13 at 14:09
  • 7
    How can you know if the chain starts with `A` and not with `B` or `C` in `A-B-C-A-B-...`? – Andriy M Aug 01 '13 at 15:56
  • Hi Try this http://stackoverflow.com/questions/7428669/how-to-setup-the-maxrecursion-option-for-a-cte-inside-a-table-valued-function May be help you. – Harshil Aug 07 '13 at 13:23

4 Answers4

5

The problem is that you aren't really dealing with strict hierarchies; you're dealing with directed graphs, where some graphs have cycles. Notice that your nbgroup #1 doesn't have any canonical root-- it could be a, b, or c due to the cyclic reference from c-a.

The basic way of dealing with this is to think in terms of graph techniques, not recursion. In fact, an iterative approach (not using a CTE) is the only solution I can think of in SQL. The basic approach is explained here.

Here is a SQL Fiddle with a solution that addresses both the cycles and the shared-leaf case. Notice it uses iteration (with a failsafe to prevent runaway processes) and table variables to operate; I don't think there's any getting around this. Note also the changed sample data (a-g changed to a-h; explained below).

If you dig into the SQL you'll notice that I changed some key things from the solution given in the link. That solution was dealing with undirected edges, whereas your edges are directed (if you used undirected edges the entire sample set is a single component because of the a-g connection).

This gets to the heart of why I changed a-g to a-h in my sample data. Your specification of the problem is straightforward if only leaf nodes are shared; that's the specification I coded to. In this case, a-h and g-h can both get bundled off to their proper components with no problem, because we're concerned about reachability from parents (even given cycles).

However, when you have shared branches, it's not clear what you want to show. Consider the a-g link: given this, g-h could exist in either component (a-g-h or f-g-h). You put it in the second, but it could have been in the first instead, right? This ambiguity is why I didn't try to address it in this solution.

Edit: To be clear, in my solution above, if shared braches ARE encountered, it treats the whole set as a single component. Not what you described above, but it will have to be changed after the problem is clarified. Hopefully this gets you close.

Dominic P
  • 2,841
  • 1
  • 14
  • 16
2

You should use a recursive query. In the first part we select all records which are top level nodes (have no parents) and using ROW_NUMBER() assign them group ID numbers. Then in the recursive part we add to them children one by one and use parent's groups Id numbers.

with CTE as 
(

select t1.parent,t1.child,
       ROW_NUMBER() over (order by t1.parent) rn

from t t1 where 
not exists (select 1 from t where child=t1.parent)
union all
select t.parent,t.child, CTE.rn
from t  
join CTE on t.parent=CTE.Child  
)
select * from CTE
order by RN,parent

SQLFiddle demo

valex
  • 23,966
  • 7
  • 43
  • 60
  • Seems perfect but I think I've another issue ... some groups can have infinite recusion – Visée Maxence Jul 30 '13 at 14:16
  • @ViséeMaxence: You need to put a check on the union with the children to limit your recursion. http://stackoverflow.com/a/660145/128217 – zimdanen Aug 01 '13 at 13:25
0

Painful problem of graph walking using recursive CTEs. This is the problem of finding connected subgraphs in a graph. The challenge with using recursive CTEs is to prevent unwarranted recursion -- that is, infinite loops In SQL Server, that typically means storing them in a string.

The idea is to get a list of all pairs of nodes that are connected (and a node is connected with itself). Then, take the minimum from the list of connected nodes and use this as an id for the connected subgraph.

The other idea is to walk the graph in both directions from a node. This ensures that all possible nodes are visited. The following is query that accomplishes this:

with fullt as (
      select keyA, keyB
      from t
      union
      select keyB, keyA
      from t
     ),
     CTE as (
      select t.keyA, t.keyB, t.keyB as last, 1 as level,
             ','+cast(keyA as varchar(max))+','+cast(keyB as varchar(max))+',' as path
      from fullt t
      union all
      select cte.keyA, cte.keyB,
             (case when t.keyA = cte.last then t.keyB else t.keyA
              end) as last,
             1 + level,
             cte.path+t.keyB+','
      from fullt t join
           CTE
           on t.keyA = CTE.last or
              t.keyB = cte.keyA 
      where cte.path not like '%,'+t.keyB+',%'
     ) -- select * from cte where 'g' in (keyA, keyB)
select t.keyA, t.keyB,
       dense_rank() over (order by min(cte.Last)) as grp,
       min(cte.Last)
from t join
     CTE
     on (t.keyA = CTE.keyA and t.keyB = cte.keyB) or
        (t.keyA = CTE.keyB and t.keyB = cte.keyA)
where cte.path like '%,'+t.keyA+',%' or
      cte.path like '%,'+t.keyB+',%'
group by t.id, t.keyA, t.keyB
order by t.id;

The SQLFiddle is here.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I notice that your solution has the same problem as mine in terms of shared branches. Mine respects shared leaf nodes (as part of two separate components) but, like yours, will merge two components when anything more is shared. Still, I was interested to see the CTE implementation-- good work! – Dominic P Aug 05 '13 at 15:36
-1

you might want to check with COMMON TABLE EXPRESSIONS

here's the link

RoMEoMusTDiE
  • 4,739
  • 1
  • 17
  • 26