1

I have the following data

create table company (com_cd varchar2(100), act_id number(10));

insert into company values ('IBM',100);
insert into company values ('IBM',200);
insert into company values ('IBM',300);
insert into company values ('HP',50);
insert into company values ('HP',85);

Then I ran this query

select COM_CD, 
REGEXP_SUBSTR(DUP_ACT_ID,'[^,]+',1,1) AS DUP_ACT_1,
REGEXP_SUBSTR(DUP_ACT_ID,'[^,]+',1,2) AS DUP_ACT_2,
REGEXP_SUBSTR(DUP_ACT_ID,'[^,]+',1,3) AS DUP_ACT_3 
FROM (
SELECT COM_CD, LISTAGG(ACT_ID,',')  DUP_ACT_ID
FROM COMPANY
GROUP BY COM_CD HAVING COUNT(*) > 1
);

COM_CD DUP_ACT_1 DUP_ACT_2 DUP_ACT_3
HP     50        85
IBM    100       200       300

It gave me the correct output but I won't know in a table with thousands of values whether there would be 1 or 10 duplicate ACT_ID. So I am looking for a solution where I don't use regexp_substr and instead get the output somehow based on number of values in the DUP_ACT_ID column. Any help is appreciated.

astentx
  • 6,393
  • 2
  • 16
  • 25
Sam
  • 15
  • 4

1 Answers1

0

If you don't have to have a single row per com_cd, you could use something like the following:

SELECT com_cd,seq,act_id dup_act_id from
    (select distinct com_cd,act_id,
       dense_rank() over (partition by com_cd order by act_id) seq,
       count(distinct act_id) over (partition by com_cd) cnt
from company)
where cnt > 1
order by com_cd,seq

COM_CD  SEQ DUP_ACT_ID
HP      1   50
HP      2   85
IBM     1   100
IBM     2   200
IBM     3   300

You didn't mention what should happen if we see the same act_id multiple times within a com_cd, I assumed you would only want to see it once.

If you need a new column for each dup_act_id, you could pivot the above query:

WITH dup_accts as (SELECT com_cd,seq,act_id dup_act_id from
    (select distinct com_cd,act_id,
       dense_rank() over (partition by com_cd order by act_id) seq,
       count(distinct act_id) over (partition by com_cd) cnt
from company)
where cnt > 1)
select * from dup_accts pivot(max(dup_act_id) for seq in (1 dup_act_1,2 dup_act_2,3 dup_act_3))

COM_CD  DUP_ACT_1   DUP_ACT_2   DUP_ACT_3
IBM    100          200         300
HP     50           85

But in that case you'd still have to add a new section to the in() clause for each additional dup_act_id.

Mike
  • 1,039
  • 1
  • 12
  • 20
  • Yes. The output needs to be in a column format. A separate column for each duplicate value. Unfortunately, that's how the business likes to see it. – Sam Aug 11 '22 at 03:15
  • @Sam I feel your pain. As far as I know there’s no way to get a variable # of columns just using SQL. So you can either write the query with enough columns to cover the max # of dupes you expect, or first get a count of the max # and then generate the query to support that #. – Mike Aug 11 '22 at 03:30