0

I have the below table

Table A:

row_number  id    start_dt    end_dt   cust_dt    cust_id
   1        101    4/8/19     4/20/19   4/10/19   725
   2        101    4/21/19    5/20/19   4/10/19   456
   3        101    5/1/19     6/30/19   4/10/19   725
   4        101    7/1/19     8/20/19   4/10/19   725

I need to count "duplicates" in a table for testing purposes.

Criteria: Need to exclude the start_dt and end_dt from my calculation. It's only a duplicate if lead row is duplicated. So, for example row 1, row 3 or 4 are the same but only row 3 and 4 would be considered duplicates in this example.

What I have tried: rank with a lead and self join but that doesn't seem to be working on my end.

How can I count the id to determine if there are duplicates?

Output: (something like below)

count    id 
  2      101

End results for me is to have a count of 1 for the table

count  id
 1     101
Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
Mr John
  • 231
  • 1
  • 3
  • 18
  • Why do you have two columns called `id`? – GMB Jan 21 '20 at 16:37
  • 1
    @GMB updated the last id to cust_id – Mr John Jan 21 '20 at 16:39
  • You can use ROW_NUMBER() OVER (PARTITION BY id,start_dt,end_d,cust_dt, cut_id ORDER BY id,start_dt,end_d,cust_dt, cut_id) as RN and then everything which has RN > 1 is already present, so you can easily count those. – Volokh Jan 21 '20 at 16:45

2 Answers2

2

Use row_number analytical function as following (gaps and island problem):

Select count(1), id from
(Select t.*, 
        row_number() over (order by row_number) as rn,
        row_number() over (partition by id, cust_dt, cust_id order by row_number) as part_rn
   From your_table t)
Group by id, cust_dt, cust_id, (rn-part_rn)
Having count(1) > 1

db<>fiddle demo

Cheers!!

Popeye
  • 35,427
  • 4
  • 10
  • 31
1

If your definition of a duplicated row is: the CUST_IDin the lead row (with same id order by row_number) equalst to the current CUST_ID,

you may write it down simple using the LEAD analytic function.

select ID, ROW_NUMBER, CUST_ID,
case when CUST_ID = lead(CUST_ID) over (partition by id order by ROW_NUMBER) then 1 end is_dup
from tab

        ID ROW_NUMBER    CUST_ID     IS_DUP
---------- ---------- ---------- ----------
       101          1        725           
       101          2        456           
       101          3        725          1
       101          4        725       

The aggregated query to get the number of duplicated rows would than be

with dup as (
select ID, ROW_NUMBER, CUST_ID,
case when CUST_ID = lead(CUST_ID) over (partition by id order by ROW_NUMBER) then 1 end is_dup
from tab)
select ID, sum(is_dup) dup_cnt
from dup
group by ID

        ID    DUP_CNT
---------- ----------
       101          1
Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53