2

We have a CRM DB which for the last 6 weeks has been creating duplicate CaseID's

I need to go in and give new case id's int he 20000000 range to all of the duplicates.

So I have found all the duplicates like this

SELECT CaseNumber, 
    COUNT(CaseNumber) AS NumOccurrences
FROM Goldmine.dbo.cases
WHERE CaseNumber > 9000000
GROUP BY CaseNumber
HAVING ( COUNT(CaseNumber) > 1 )

Which brought back this.

query results

I now need to renumber each one of these like so 20000001, 20000002, etc etc

Any help would be great.

Dale K
  • 25,246
  • 15
  • 42
  • 71
  • What have you tried so far, why didn't it work? What RDBMS are you *really* using? – Thom A Jul 05 '20 at 17:22
  • 1
    @ChrisLeonard . . . Based on the "dbo" and three-part naming, I am assuming you are using SQL Server and I removed the MySQL tag. Please tag only with the database you are really using. – Gordon Linoff Jul 05 '20 at 17:25
  • @ChrisLeonard...While re-numbering the duplicates do you need the values to be contiguous(eg: say you had values (8,8,8,9,9,10), and after update is (8,9,10,12,13,15) acceptable? – George Joseph Jul 05 '20 at 18:28

2 Answers2

0

I am going to assume that you are using SQL Server. So you can use updatable CTEs:

WITH dups as (
      SELECT c.*,
             ROW_NUMBER() OVER (ORDER BY CaseNumber) as seqnum
      FROM Goldmine.dbo.cases c
      WHERE CaseNumber > 9000000
     ),
     toupdate as (
      SELECT d.*, ROW_NUMBER() OVER (PARTITION BY CaseNumber ORDER BY CaseNumber) as inc
      FROM dups d
      WHERE seqnum > 1
     )
UPDATE toupdate
    SET CaseNumber = 20000000 + inc;

The first subquery identifies the duplicates by enumerating them. Presumably, you don't want the "first" one to change. So the second CTE selects only the real duplicates and assigns a sequential number. The outer update uses that to assign the new number.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Hi thanks for the fast help, this is great what you have written and has renumbered all of them but its kept the duplicates. I want 0 duplicates at the end so the duplicates also need renumbering and give NEW numbers also please. – Chris Leonard Jul 05 '20 at 17:47
  • As you can see here I still have lots of duplicates against the new numbers. https://ibb.co/GCLKr0s – Chris Leonard Jul 05 '20 at 17:57
  • Hi, Sorry maybe I am confusing you, I now have lots of case 20000001. The query runs but I want it to find all duplicates > 9000000 and then renumber them from 20000000 upwards. I should be left with 0 duplicates but I still have thousands. Sorry to be a pain I really do appreciate your help. – Chris Leonard Jul 05 '20 at 18:14
  • I am so close to doing this with all the help I have been given by you guys. I am sorry for not understanding so much. How can I put this all together to do this to my database as above... do I have to combine to 2 queries? – Chris Leonard Jul 05 '20 at 20:36
  • anyone? Can you please help me combine this to work? – Chris Leonard Jul 05 '20 at 21:34
  • @ChrisLeonard . . . Oops . . . there shouldn't be a `partition by` in the second CTE. – Gordon Linoff Jul 06 '20 at 01:05
0

By the look of the data you have got overlaps in numbers because there are records which overlap with the "updated" values if we are to increment by 1. Here is a way to fix this,

with data
  as (select *
            ,count(*) over(partition by x) as cnt
            ,row_number() over(order by x) as rnk
        from t
      )
update data
   set x = x+rnk;

Initial record set

+-----------+
| orig_data |
+-----------+
|  10000009 |
|  10000009 |
|  10000009 |
|  10000009 |
|  10000010 |
|  10000010 |
|  10000011 |
+-----------+

After update

+-----------+
| after_upd |
+-----------+
|  10000010 |
|  10000011 |
|  10000012 |
|  10000014 |
|  10000015 |
|  10000017 |
+-----------+

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=c4ea8335abb074b8c0143e2f7c767f04

George Joseph
  • 5,842
  • 10
  • 24