Reissue ids to records with duplicates

Question

We have a CRM DB which for the last 6 weeks has been creating duplicate CaseID's

I need to go in and give new case id's int he 20000000 range to all of the duplicates.

So I have found all the duplicates like this

SELECT CaseNumber, 
    COUNT(CaseNumber) AS NumOccurrences
FROM Goldmine.dbo.cases
WHERE CaseNumber > 9000000
GROUP BY CaseNumber
HAVING ( COUNT(CaseNumber) > 1 )

Which brought back this.

query results

I now need to renumber each one of these like so 20000001, 20000002, etc etc

Any help would be great.

What have you tried so far, why didn't it work? What RDBMS are you *really* using? — Thom A, Jul 05 '20 at 17:22
@ChrisLeonard . . . Based on the "dbo" and three-part naming, I am assuming you are using SQL Server and I removed the MySQL tag. Please tag only with the database you are really using. — Gordon Linoff, Jul 05 '20 at 17:25
@ChrisLeonard...While re-numbering the duplicates do you need the values to be contiguous(eg: say you had values (8,8,8,9,9,10), and after update is (8,9,10,12,13,15) acceptable? — George Joseph, Jul 05 '20 at 18:28

Gordon Linoff · Answer 1 · 2020-07-06T01:05:24.817

0

I am going to assume that you are using SQL Server. So you can use updatable CTEs:

WITH dups as (
      SELECT c.*,
             ROW_NUMBER() OVER (ORDER BY CaseNumber) as seqnum
      FROM Goldmine.dbo.cases c
      WHERE CaseNumber > 9000000
     ),
     toupdate as (
      SELECT d.*, ROW_NUMBER() OVER (PARTITION BY CaseNumber ORDER BY CaseNumber) as inc
      FROM dups d
      WHERE seqnum > 1
     )
UPDATE toupdate
    SET CaseNumber = 20000000 + inc;

The first subquery identifies the duplicates by enumerating them. Presumably, you don't want the "first" one to change. So the second CTE selects only the real duplicates and assigns a sequential number. The outer update uses that to assign the new number.

edited Jul 06 '20 at 01:05

answered Jul 05 '20 at 17:24

Gordon Linoff

1,242,037
58
646
786

Hi thanks for the fast help, this is great what you have written and has renumbered all of them but its kept the duplicates. I want 0 duplicates at the end so the duplicates also need renumbering and give NEW numbers also please. – Chris Leonard Jul 05 '20 at 17:47
As you can see here I still have lots of duplicates against the new numbers. https://ibb.co/GCLKr0s – Chris Leonard Jul 05 '20 at 17:57
Hi, Sorry maybe I am confusing you, I now have lots of case 20000001. The query runs but I want it to find all duplicates > 9000000 and then renumber them from 20000000 upwards. I should be left with 0 duplicates but I still have thousands. Sorry to be a pain I really do appreciate your help. – Chris Leonard Jul 05 '20 at 18:14
I am so close to doing this with all the help I have been given by you guys. I am sorry for not understanding so much. How can I put this all together to do this to my database as above... do I have to combine to 2 queries? – Chris Leonard Jul 05 '20 at 20:36
anyone? Can you please help me combine this to work? – Chris Leonard Jul 05 '20 at 21:34
@ChrisLeonard . . . Oops . . . there shouldn't be a `partition by` in the second CTE. – Gordon Linoff Jul 06 '20 at 01:05

George Joseph · Answer 2 · 2020-07-05T18:38:40.987

By the look of the data you have got overlaps in numbers because there are records which overlap with the "updated" values if we are to increment by 1. Here is a way to fix this,

with data
  as (select *
            ,count(*) over(partition by x) as cnt
            ,row_number() over(order by x) as rnk
        from t
      )
update data
   set x = x+rnk;

Initial record set

+-----------+
| orig_data |
+-----------+
|  10000009 |
|  10000009 |
|  10000009 |
|  10000009 |
|  10000010 |
|  10000010 |
|  10000011 |
+-----------+

After update

+-----------+
| after_upd |
+-----------+
|  10000010 |
|  10000011 |
|  10000012 |
|  10000014 |
|  10000015 |
|  10000017 |
+-----------+

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=c4ea8335abb074b8c0143e2f7c767f04

Reissue ids to records with duplicates

2 Answers2