0

The table contains PID and corresponding values. PIDs with same values should be assigned same group id. A, B, D are directly connected (value = ABC) and since C is connected with D (value = GHI), A,B,C & D are all part of the same group.

PID    Value 
--------------
A101    ABC    
A101    XYZ       
B101    ABC     
C101    GHI     
C101    DEF    
D101    ABC    
D101    GHI    
E101    KLM    
F101    MNO    

Expected Result:

PID    GroupID
--------------
A101     1       
B101     1       
C101     1     
D101     1     
E101     2  
F101     3

I have tried below query without success:

with cte as (     
select PID, Value, Rank() OVER(ORDER BY Value) rn          
from  t           
union all         
select t.PID, cte.Value, cte.rn       
from t join cte       
     on t.Value = cte.Value and              
        t.PID < cte.PID  
)           
select *      
from cte

Is there a way to fix the query and also write a more efficient query (about 1 million rows) in SQL Server 2012/2014?

M s
  • 1
  • 1

1 Answers1

0

try the following:

declare @tab table (PID varchar(10), Val varchar(100))
insert into @tab select 'A101', 'ABC'
insert into @tab select 'A101', 'XYZ'
insert into @tab select 'B101', 'ABC'
insert into @tab select 'C101', 'GHI'
insert into @tab select 'C101', 'DEF'
insert into @tab select 'D101', 'ABC'
insert into @tab select 'D101', 'GHI'
insert into @tab select 'E101', 'KLM'
insert into @tab select 'F101', 'MNO'

select PID, dense_rank() over (order by group_id) GROUP_ID
from
(
select x.PID, min(t.num) group_id from @tab X
join (select val, dense_rank() over (order by val) num from @tab)t on t.Val = X.Val
group by x.PID
)t2
order by 2
sacse
  • 3,634
  • 2
  • 15
  • 24
  • This doesn't work as C101 gets a different group ID. C101 should be assigned the same group id as A101, B101 and D101. – M s Jun 24 '18 at 19:28