Count lead duplicate rows

Question

I have the below table

Table A:

row_number  id    start_dt    end_dt   cust_dt    cust_id
   1        101    4/8/19     4/20/19   4/10/19   725
   2        101    4/21/19    5/20/19   4/10/19   456
   3        101    5/1/19     6/30/19   4/10/19   725
   4        101    7/1/19     8/20/19   4/10/19   725

I need to count "duplicates" in a table for testing purposes.

Criteria: Need to exclude the start_dt and end_dt from my calculation. It's only a duplicate if lead row is duplicated. So, for example row 1, row 3 or 4 are the same but only row 3 and 4 would be considered duplicates in this example.

What I have tried: rank with a lead and self join but that doesn't seem to be working on my end.

How can I count the id to determine if there are duplicates?

Output: (something like below)

count    id 
  2      101

End results for me is to have a count of 1 for the table

count  id
 1     101

You can use ROW_NUMBER() OVER (PARTITION BY id,start_dt,end_d,cust_dt, cut_id ORDER BY id,start_dt,end_d,cust_dt, cut_id) as RN and then everything which has RN > 1 is already present, so you can easily count those. — Volokh, Jan 21 '20 at 16:45

Popeye · Answer 1 · 2020-01-22T08:14:19.353

2

Use row_number analytical function as following (gaps and island problem):

Select count(1), id from
(Select t.*, 
        row_number() over (order by row_number) as rn,
        row_number() over (partition by id, cust_dt, cust_id order by row_number) as part_rn
   From your_table t)
Group by id, cust_dt, cust_id, (rn-part_rn)
Having count(1) > 1

db<>fiddle demo

Cheers!!

edited Jan 22 '20 at 08:14

answered Jan 21 '20 at 17:16

Popeye

35,427
4
10
31

@BarbarosÖzhan, I think it will yield output as 2 which is required. – Popeye Jan 22 '20 at 00:47
have a look at [this](https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=0765a8b789595105e5ce2f5ae6a3b375) . – Barbaros Özhan Jan 22 '20 at 07:14
1

Ohh yes, Updated the answer with changed `GROUP BY` clause – Popeye Jan 22 '20 at 08:15

score 1 · Answer 2 · answered Jan 21 '20 at 18:29

If your definition of a duplicated row is: the CUST_IDin the lead row (with same id order by row_number) equalst to the current CUST_ID,

you may write it down simple using the LEAD analytic function.

select ID, ROW_NUMBER, CUST_ID,
case when CUST_ID = lead(CUST_ID) over (partition by id order by ROW_NUMBER) then 1 end is_dup
from tab

        ID ROW_NUMBER    CUST_ID     IS_DUP
---------- ---------- ---------- ----------
       101          1        725           
       101          2        456           
       101          3        725          1
       101          4        725

The aggregated query to get the number of duplicated rows would than be

with dup as (
select ID, ROW_NUMBER, CUST_ID,
case when CUST_ID = lead(CUST_ID) over (partition by id order by ROW_NUMBER) then 1 end is_dup
from tab)
select ID, sum(is_dup) dup_cnt
from dup
group by ID

        ID    DUP_CNT
---------- ----------
       101          1

Count lead duplicate rows

2 Answers2