Assign a unique ID to groups of rows that connect over 2 columns in SQL

Question

My situation is the following:

I've a table with 2 ID-Columns and I want to assign a unique ID to rows that link by either one. Here is an example of 6 rows in my table. All of these rows need to get the same unique ID.

Row	ID1	ID2
1	A	1
2	A	2
3	B	2
4	B	3
5	C	3
6	C	4

Rows 1 and 2 need to get the same unique id because they have the same ID1.

Row 3 needs to get that as well because its ID2 matches the ID2 of row 2.

Row 4 needs to get that as well because its ID1 matches the ID1 of row 3.

Row 5 needs to get that as well because its ID2 matches the ID2 of row 4.

Row 6 needs to get that as well because its ID1 matches the ID1 of row 5.

Basically the two columns form a chain and I want to assign an ID to that chain. Is there some reasonably efficient way to do this in SQL?

Looks like you want algorithm for graph components https://en.wikipedia.org/wiki/Component_(graph_theory) — Serg, Oct 30 '21 at 20:44

score 0 · Answer 1 · answered Oct 31 '21 at 11:08

Okay. This is really clumsy solution (and I should probably hand in my SQL Server badge for even suggesting it), but I think it'll get you over the line. I just hope you're not running this on a very large data set.

First up, I created some dummy temp tables to mimic your data (plus a few extra rows):

DROP TABLE IF EXISTS #Tbl1 ;
CREATE TABLE #Tbl1
(
    [id] TINYINT NOT NULL
    , [ID1] CHAR(1) NOT NULL
    , [ID2] TINYINT NOT NULL
) ;
INSERT
    INTO    #Tbl1 ( [id], [ID1], [ID2] )
    VALUES  ( 1, 'A', 1 ), ( 2, 'A', 2 ), ( 3, 'B', 2 )
            , ( 4, 'B', 3 ), ( 5, 'C', 3 ), ( 6, 'C', 4 )
            , ( 7, 'D', 5 ), ( 8, 'D', 5 ), ( 9, 'E', 6 ) ;

Then, using a CTE and the LAG function, I identified which rows would see an increment of a unique identifier, and dumped all this in temp table:

DROP TABLE IF EXISTS #Tbl2 ;
WITH cte_Lags AS
(
    SELECT          [id]
                    , [ID1]
                    , LAG ( [ID1], 1, '' )
                        OVER ( ORDER BY [ID1] ASC, [ID2] ASC ) AS [ID1_lag]
                    , [ID2]
                    , LAG ( [ID2], 1, 0 )
                        OVER ( ORDER BY [ID1] ASC, [ID2] ASC ) AS [ID2_lag]
    FROM            #Tbl1
)
SELECT          [id] AS [row]
                , [ID1]
                , [ID2]
                , CASE
                    WHEN    [ID1] = [ID1_lag]
                    OR      [ID2] = [ID2_lag]
                        THEN    0
                    ELSE        1
                    END AS [incr_id]
    INTO        #Tbl2
FROM            cte_Lags ;

I then add a column for your unique ID to the temp table:

ALTER TABLE #Tbl2 ADD [unique_id] TINYINT NULL ;

Now this is where it gets real messy!

I create a iterative loop that cycles through each row of the temp table and updates the unique_id column with the appropriate number, incremented only if the row is flagged to be incremented:

DECLARE @RowNum AS TINYINT ;
DECLARE @i AS TINYINT = 0 ;

WHILE   ( ( SELECT COUNT(*) FROM #Tbl2 WHERE [unique_id] IS NULL ) > 0 )
BEGIN
    SELECT      TOP(1) @RowNum = [row]
    FROM        #Tbl2
    WHERE       [unique_id] IS NULL
    ORDER BY    [ID1] ASC, [ID2] ASC, [row] ASC ;

    IF          ( ( SELECT [incr_id] FROM #Tbl2 WHERE [row] = @RowNum ) = 1 )
        SET         @i += 1 ;

    UPDATE      #Tbl2
    SET         [unique_id] = @i
    WHERE       [row] = @RowNum ;
END

SELECT      [row]
            , [ID1]
            , [ID2]
            , [unique_id]
FROM        #Tbl2
ORDER BY    [ID1] ASC, [ID2] ASC ;

Now this all assumes that data doesn't repeat further down the table -- e.g. ('A', 1) is not going to reappear at row 50. If it does, this'll all need a little rethink.

I really hope someone cleverer than I can do this for you in a simple recursive CTE or a funky grouping function. But until then, this'll keep the boss happy.

score 0 · Answer 2 · answered Oct 31 '21 at 12:39

A recursive CTE is useful for something like this.
Traverse through the records, then group the results.

WITH RCTE AS (
   SELECT
     [Row] AS BaseRow
   , ID1 AS BaseID1
   , ID2 AS BaseID2
   , 1 AS Lvl
   , [Row], ID1, ID2
   FROM YourTable
   UNION ALL
   SELECT 
     c.BaseRow
   , c.BaseID1
   , c.BaseID2
   , c.Lvl + 1
   , t.[Row], t.ID1, t.ID2
   FROM RCTE c 
   JOIN YourTable t
     ON t.[Row] < c.[Row] 
    AND (t.ID2 = c.ID2 OR (t.ID2 < c.ID2 AND t.ID1 = c.ID1))
)
SELECT 
  BaseRow AS [Row]
, BaseID1 AS ID1
, MIN(ID2) AS ID2
FROM RCTE
GROUP BY BaseRow, BaseID1
ORDER BY BaseRow, BaseID1;

db<>fiddle here

Assign a unique ID to groups of rows that connect over 2 columns in SQL

2 Answers2