Recursive SQL query to find all matching identifiers

Question

I have a table with following structure

CREATE TABLE Source
(
     [ID1] INT, 
     [ID2] INT
);

INSERT INTO Source ([ID1], [ID2]) 
VALUES (1, 2), (2, 3), (4, 5),
       (2, 5), (6, 7)

Example of Source and Result tables:

Source table basically stores which id is matching which another id. From the diagram it can be seen that 1, 2, 3, 4, 5 are identical. And 6, 7 are identical. I need a SQL query to get a Result table with all matches between ids.

I found this item on the site - Recursive query in SQL Server similar to my task, but with a different result.

I tried to edit the code for my task, but it does not work. "The statement terminated. The maximum recursion 100 has been exhausted before statement completion."

;WITH CTE
AS
(
    SELECT DISTINCT
        M1.ID1,
        M1.ID1 as ID2
    FROM Source M1
        LEFT JOIN Source M2
            ON M1.ID1 = M2.ID2
    WHERE M2.ID2 IS NULL
    UNION ALL
    SELECT
        C.ID2,
        M.ID1
    FROM CTE C
        JOIN Source M
            ON C.ID1 = M.ID1
)
SELECT * FROM CTE ORDER BY ID1

Thanks a lot for the help!

Can you provide an example of your expected output? I'm not clear what you mean by "get a table with full dependencies between identifiers". — RToyo, Oct 03 '18 at 20:24
Please replace **pictures of text** with text and provide code you've got for now. — Ivan Starostin, Oct 03 '18 at 20:27
Please [include images inline](https://meta.stackexchange.com/a/75498/357280) if you must use them (unless perhaps they're gigantic). — Elaskanator, Oct 03 '18 at 20:28
Pictures of data and tables is generally not helpful. Why? https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557 — Sean Lange, Oct 03 '18 at 20:31
The editor did not let me insert the picture directly into the text. It was written: Not enough 10 something there ... — Vovi, Oct 03 '18 at 20:36
OK. I will try to replace the image with the text. I hoped that the picture would be clearer. — Vovi, Oct 03 '18 at 20:39

score 2 · Accepted Answer · answered Oct 03 '18 at 21:20

This is a challenging question. You are trying to walk through a graph in two directions. There are two key ideas:

Add "reverse" edges, so the graph behaves like a digraph but with edges in both directions.
Keep a list of edges that have been visited. In SQL Server, strings are one method.

So:

with s as (
      select id1, id2 from source
      union  -- on purpose
      select id2, id1 from source
     ),
     cte as (
      select s.id1, s.id2, ',' + cast(s.id1 as varchar(max)) + ',' + cast(s.id2 as varchar(max)) + ',' as ids
      from s
      union all
      select cte.id1, s.id2, ids + cast(s.id2 as varchar(max)) + ','
      from cte join
           s
           on cte.id2 = s.id1
      where cte.ids not like '%,' + cast(s.id2 as varchar(max)) + ',%'
     )
select *
from cte
order by 1, 2;

Here is a db<>fiddle.

Well, at least separator is different. Even tips are almost word-for-word, that's funny ) — Ivan Starostin, Oct 03 '18 at 21:33
Gordon, Ivan thank you for your answers! It's seems to me - this is a good solution of the problem! — Vovi, Oct 04 '18 at 08:52

score 2 · Answer 2 · answered Oct 03 '18 at 21:26

Since all node connections are bidirectional - add reversed relations to the original list
Find all possible paths from each node; almost usual recursion, the only difference is - we need to keep root id1
Avoid cycles - we need to be aware of it because we don't have directions

source:

;with src as(
  select id1, id2 from source
  union 
  -- reversed connections
  select id2, id1 from source
), rec as (
  select id1, id2, CAST(CONCAT('/', src.id1, '/', src.id2, '/') as varchar(8000)) path
  from src

  union all

  -- keep the root id1 from the start of each path
  select rec.id1, src.id2, CAST(CONCAT(rec.path, src.id2, '/') as varchar(8000))
  from rec
  -- usual recursion
  inner join src on src.id1 = rec.id2
  -- avoid cycles
  where rec.path not like CONCAT('%/', src.id2, '/%')
)
select id1, id2, path 
from rec
order by 1, 2

output

| id1 | id2 |      path |
|-----|-----|-----------|
|   1 |   2 |     /1/2/ |
|   1 |   3 |   /1/2/3/ |
|   1 |   4 | /1/2/5/4/ |
|   1 |   5 |   /1/2/5/ |
|   2 |   1 |     /2/1/ |
|   2 |   3 |     /2/3/ |
|   2 |   4 |   /2/5/4/ |
|   2 |   5 |     /2/5/ |
|   3 |   1 |   /3/2/1/ |
|   3 |   2 |     /3/2/ |
|   3 |   4 | /3/2/5/4/ |
|   3 |   5 |   /3/2/5/ |
|   4 |   1 | /4/5/2/1/ |
|   4 |   2 |   /4/5/2/ |
|   4 |   3 | /4/5/2/3/ |
|   4 |   5 |     /4/5/ |
|   5 |   1 |   /5/2/1/ |
|   5 |   2 |     /5/2/ |
|   5 |   3 |   /5/2/3/ |
|   5 |   4 |     /5/4/ |
|   6 |   7 |     /6/7/ |
|   7 |   6 |     /7/6/ |

http://sqlfiddle.com/#!18/76114/13

source table will contain about 100,000 records

There is nothing that can help you with this. The task is unpleasant - finding all possible connections. Almost CROSS JOIN. With even more connections in the end.

score 1 · Answer 3 · answered Oct 03 '18 at 21:55

Looks like I came up with a similar answer as the other posters. My approach was to insert the existing value pairs, and then insert the reverse of each pair.

Once you expand the list of value pairs, you can transverse the table to find all the pairs.

CREATE TABLE #Source
    ([ID1] int, [ID2] int);

INSERT INTO #Source 
(
    [ID1]
    ,[ID2]
) 
VALUES   
(1, 2)
,(2, 3)
,(4, 5)
,(2, 5)
,(6, 7)

INSERT INTO #Source 
(
    [ID1]
    ,[ID2]
) 
SELECT 
    [ID2]
    ,[ID1] 
FROM #Source

;WITH expanded AS
(
    SELECT DISTINCT 
        ID1 = s1.ID1
        ,ID2 = s1.ID2
    FROM #Source s1
    LEFT JOIN #Source s2 ON s1.ID2 = s2.ID1

    UNION

    SELECT DISTINCT 
        ID1 = s1.ID1
        ,ID2 = s2.ID2
    FROM #Source s1
    LEFT JOIN #Source s2 ON s1.ID2 = s2.ID1
    WHERE s1.ID1 <> s2.ID2

)
,recur AS
(
    SELECT DISTINCT 
        e1.ID1
        ,e1.ID2
    FROM expanded e1
    LEFT JOIN expanded e2 ON e1.ID2 = e2.ID1
    WHERE e1.ID1 <> e1.ID2

    UNION ALL

    SELECT DISTINCT 
        e1.ID1
        ,e2.ID2
    FROM expanded e1
    INNER JOIN expanded e2 ON e1.ID2 = e2.ID1
    WHERE e1.ID1 <> e2.ID2
)
SELECT DISTINCT 
    ID1, ID2 
FROM recur
ORDER BY ID1, ID2

DROP TABLE #Source

score 0 · Answer 4 · answered Oct 03 '18 at 20:59

This is a way to get that output by brute force, but may not be the best solution with a different/larger data set:

select sub1.rnk as ID1
,sub2.rnk as ID2
from
(
select a.*
,rank() over (partition by 1 order by id1, id2) as RNK
from source a
) sub1
cross join
(
select a.*
,rank() over (partition by 1 order by id1, id2) as RNK
from source a
) sub2
where sub1.rnk <> sub2.rnk
union all
select id1 as ID1
,id2 as ID2
from source
where id1 = 6
union all
select id2 as ID1
,id1 as ID2
from source
where id1 = 6;

thanks! But the data set will be different and the source table will contain about 100,000 records. This decision is too rough. — Vovi, Oct 03 '18 at 21:13

Recursive SQL query to find all matching identifiers

4 Answers4