How to delete duplicate records (SQL) when there is no identification column?

Question

It is a datawarehouse project in which I load a table in which each column refers to another table. The problem is that due to an error in the process, many duplicate records were loaded (approximately 13,000) but they do not have a unique identifier, therefore they are exactly the same. Is there a way to delete only one of the duplicate records so that I don't have to delete everything and repeat the table loading process?

score 2 · Answer 1 · answered Sep 03 '20 at 20:27

You can use row_number() and a cte:

with cte as (
    select row_number() over(
        partition by col1, col2, ... 
        order by (select null)) rn
    from mytable
)
delete from cte where rn > 1

The window functions guarantees that the same number will not be assigned twice within a partition - you need to enumerate all column columns in the partition by clause.

If you are going to delete a significant part of the rows, then it might be simpler to empty and recreate the table:

create table tmptable as select distinct * from mytable;
truncate table mytable;   -- back it up first!
insert into mytable select * from tmptable;
drop table tmptable;

score 2 · Answer 2 · edited Sep 03 '20 at 21:48

You can make use row_number to delete the duplicate rows by first partitioning them and then ordering by one of the columns with that partition.

You have to list all your columns in partition by if records are completely identical.

WITH CTE1 AS (
    SELECT A.*
        , ROW_NUMBER(PARTITION BY CODDIMALUMNO, (OTHER COLUMNS) ORDER BY CODDIMALUMNO) RN
    FROM TABLE1 A
)
DELETE FROM CTE1
WHERE RN > 1;

score 1 · Answer 3 · answered Sep 03 '20 at 20:27

1

You can use row_number() and an updatable CTE:

with todelete as (
      select t.*, row_number() over (partition by . . . ) as seqnum
      from t
     )
delete from todelete
    where seqnum > 1;

The . . . is for the columns that define duplicates.

answered Sep 03 '20 at 20:27

Gordon Linoff

1,242,037
58
646
786

How to delete duplicate records (SQL) when there is no identification column?

3 Answers3