Ranking duplicates and then adding that rank to a value to make them unique

Question

I have records that aren't unique, however, I can't get delete any of the duplicates because the data has a valid reason for being there. Normally, I spend time manually editing every single line adding 1,2,3,4 in sequence for the amount of duplicates that may occur. I'm looking for help automating this process.

Updating the fields is fairly simple and concatenating them with anything else is easily done however, I'd like to be able to run a script that updates the table by finding all the duplicates, ranks them and then adds that rank to the duplicates to make it unique. It is this ranking bit where I'm out of my depth slightly.

For context, I've run scripts that update the keys by combining the row numbers however each month the dataset changes slightly and therefore row numbers get thrown out; example below.

   BEGIN
   ;WITH [CTE] AS 
   (
   SELECT 
   ROW_NUMBER() OVER (PARTITION BY [COLUMN_NAME] ORDER BY [COLUMN_NAME]) AS [ROWNUMBER],
   [COLUMN_NAME]
   FROM [TABLE_NAME]
   )

   UPDATE [CTE]
   SET [COLUMN_NAME] = CONCAT([COLUMN_NAME],([ROWNUMBER] - 1))
   WHERE [ROWNUMBER] > 1

   SELECT [COLUMN_NAME] FROM [TABLE_NAME] ORDER BY [COLUMN_NAME]

   END`

If this can be edited in any way to work in the way described above that would be brilliant, if not, if anyone more learned than myself can point me in another direction, I would be most grateful.

I am not understanding your problem. [This db<>fiddle](https://dbfiddle.uk/HbnZLAWj) executes the code that you posted. My only changes were to (1) change the table and column names to something a bit more readable, and (2) replace the ORDER BY column with something other than PARTITION BY column. It seems to produce reasonable results. Can you explain further what you need it to do differently? — T N, Jun 23 '23 at 21:41

score 0 · Accepted Answer · answered Jun 23 '23 at 14:32

Primary keys are unique by definition. Moreover, I wouldn't suggest to update your primary key, because it could cause problems if it is used as a foreign key in other tables for example. (Take a look here).

If your problem is about the primary keys your are trying to insert and not those already in the table, why not apply your procedure just on the PK before inserting them?

I suggest you to use autogenerated Primary keys, and if you are trying to insert rows coming from another database (which sould be the case if the primary key is already set), then try to alter it manually before inserting as you were trying to do.

My bad, I didn't explain it very well the column that makes them a duplicate become the primary key in another table, which is why I called them a PK. The validation that I run on the data identifies the duplicates based on this field being the same, essentially trying to catch duplicate PK's before they go into the final table. I've edited the question to remove that confusion. — Skalrus, Jun 23 '23 at 14:38

Ranking duplicates and then adding that rank to a value to make them unique

1 Answers1