Delete all but one duplicate record

Question

I have a table that is supposed to keep a trace of visitors to a given profile (user id to user id pair). It turns out my SQL query was a bit off and is producing multiple pairs instead of single ones as intended. With hindsight I should have enforced a unique constraint on each id+id pair.

Now, how could I go about cleaning up the table? What I want to do is delete all duplicate pairs and leave just one.

So for example change this:

23515 -> 52525 date_visited
23515 -> 52525 date_visited
23515 -> 52525 date_visited
12345 -> 54321 date_visited
12345 -> 54321 date_visited
12345 -> 54321 date_visited
12345 -> 54321 date_visited
23515 -> 52525 date_visited
...

Into this:

23515 -> 52525 date_visited
12345 -> 54321 date_visited

Update: Here is the table structure as requested:

id  int(10)         UNSIGNED    Non     Aucun   AUTO_INCREMENT
profile_id  int(10)         UNSIGNED    Non     0 
visitor_id  int(10)         UNSIGNED    Non     0
date_visited    timestamp           Non     CURRENT_TIMESTAMP

What is the table structure please? is there a 3rd column to tie-break values? — gbn, May 04 '11 at 11:31
@gbn: The table structure has been added (MySQL). The third column is to keep a trace of the last time a user visited a profile. The structure should probably be modified with a constraint on profile_id & visitor_id. P.S: I don't have the SQL populating the table right now but it's something along the lines of `if exists update timestamp if not create record`. — James P., May 04 '11 at 11:39

score 82 · Accepted Answer · edited Dec 04 '19 at 16:38

82

ANSI SQL Solution

Use group by in a subquery:

delete from my_tab where id not in 
(select min(id) from my_tab group by profile_id, visitor_id);

You need some kind of unique identifier(here, I'm using id).

MySQL Solution

As pointed out by @JamesPoulson, this causes a syntax error in MySQL; the correct solution is (as shown in James' answer):

delete from `my_tab` where id not in
( SELECT * FROM 
    (select min(id) from `my_tab` group by profile_id, visitor_id) AS temp_tab
);

edited Dec 04 '19 at 16:38

ahsteele

26,243
28
134
248

answered May 04 '11 at 11:34

Frank Schmitt

30,195
12
73
107

1

Great solution. I hadn't thought of using a group by (experience>knowledge). This displays a `Can't specify target in FROM clause` but there's a workaround for this (see my answer). – James P. May 04 '11 at 11:58
2

Note, this doesn't work in MySQL because it doesn't allow you to modify the table you're using in the inner select: `Error Code: 1093. You can't specify target table 'my_tab' for update in FROM clause` – Desty Mar 03 '16 at 13:44
1

I've updated the answer; I originally thought that people would read the comment / answer by @JamesPoulson and use their version, but apparently, that's not always the case. – Frank Schmitt May 29 '16 at 13:12
What if there is no id key? – user3467349 Dec 23 '16 at 12:14
Very elegant solution. Sometimes better use MAX instead of MIN, so you'll keep latest version of rows which are probably most correct. – Konstantin Svintsov Nov 17 '20 at 09:02

score 17 · Answer 2 · edited Dec 04 '19 at 16:41

17

Here's Frank Schmitt's solution with a small workaround utilizing a temporary table to allow his solution to work on MySQL:

delete from `my_tab` where id not in
( SELECT * FROM 
    (select min(id) from `my_tab` group by profile_id, visitor_id) AS temp_tab
)

edited Dec 04 '19 at 16:41

ahsteele

26,243
28
134
248

answered May 04 '11 at 12:02

James P.

19,313
27
97
155

@FrankSchmitt it's perfectly fine :) – James P. May 30 '16 at 06:43

score 16 · Answer 3 · edited Dec 21 '17 at 12:16

16

This will work:

With NewCTE
AS
(
Select *, Row_number() over(partition by ID order by ID)as RowNumber from 
table_name
)
Delete from NewCTE where RowNumber > 1

edited Dec 21 '17 at 12:16

Przemek Marcinkiewicz

1,267
1
19
32

answered Dec 21 '17 at 11:14

Vik Wilder

161
1
3

1

This answer is the best if you do not have a unique identifier in your table and don't want to create a temporary table. – Manuel Hoffmann Nov 06 '19 at 12:04

score 3 · Answer 4 · answered May 04 '11 at 11:35

3

Select all unique rows
Copy them to a new temp table
Truncate original table
Copy temp table data to original table

That's what I'd do. I'm not sure if there's 1 query that would do all this for you.

answered May 04 '11 at 11:35

gmadd

1,146
9
18

Using a temporary table is a good reflex and is actually necessary. It's probably a more adapted approach if there's a lot of data. – James P. May 04 '11 at 12:03

score -4 · Answer 5 · edited Dec 17 '19 at 13:40

-4

If you are using SQL you can manually delete the duplicate rows keeping one entry just follow this procedure:

Go into your table where you have duplicate data.
Apply the filter to segregate duplicate data for each individual id
Select all the rows you want to delete.
Press delete and save the result.
Repeat the process for each id you have duplicate entries for.

It's a long procedure but you can see the results immediately in real-time.

Hope this solution worked for you!!

edited Dec 17 '19 at 13:40

Marcucciboy2

3,156
3
20
38

answered Nov 23 '19 at 05:34

akshay choukekar

1
1

2

Lots of people that utilize these answers are working with millions (or even billions) of rows. It would take them weeks to do this manually. – Marcucciboy2 Dec 17 '19 at 13:42
2

What In the world – courtsimas Aug 25 '20 at 21:46

Delete all but one duplicate record

5 Answers5

Linked

Related