The most efficient way to delete all duplicate rows from table?

Question

I have a table:

| foo | bar |
+-----+-----+
| a   | abc |
| b   | def |
| c   | ghi |
| d   | jkl |
| a   | mno |
| e   | pqr |
| c   | stu |
| f   | vwx |

I want to delete all rows containing duplicates by foo column so that the table should look like this:

| foo | bar |
+-----+-----+
| b   | def |
| d   | jkl |
| e   | pqr |
| f   | vwx |

What is the most efficient way to do this?

John Woo · Accepted Answer · 2013-04-08T03:32:12.043

9

You can join a table from a subquery which returns only unique foo using LEFT JOIN. The rows that did not have a match on the subquery will be deleted as you desired, example

DELETE  a
FROM    TableName a
        LEFT JOIN
        (
            SELECT  foo
            FROM    TableName
            GROUP   BY Foo
            HAVING  COUNT(*) = 1
        ) b ON a.Foo = b.Foo
WHERE   b.Foo IS NULL

SQLFiddle Demo

For faster performance, add an index on column Foo.

ALTER TABLE tableName ADD INDEX(foo)

edited Apr 08 '13 at 03:32

answered Apr 07 '13 at 17:25

John Woo

258,903
69
498
492

This works perfect but it's too slow (I have a very big table). – Andrew Shulgin Apr 08 '13 at 02:39
add an index on the column so it will perform faster, example, `ALTER TABLE tableName ADD INDEX(foo)` and see the performance. – John Woo Apr 08 '13 at 03:01
Thanks but I've already done that. But it's the most fast way to do this, anyway, I see. – Andrew Shulgin Apr 08 '13 at 04:16

score 8 · Answer 2 · answered Apr 08 '13 at 03:40

8

Using EXISTS:

DELETE a
  FROM TableName a
 WHERE EXISTS (SELECT NULL
                 FROM TableName b
                WHERE b.foo = a.foo
             GROUP BY b.foo
               HAVING COUNT(*) > 1)

Using IN:

DELETE a
  FROM TableName a
 WHERE a.foo IN (SELECT b.foo
                   FROM TableName b
               GROUP BY b.foo
                 HAVING COUNT(*) > 1)

answered Apr 08 '13 at 03:40

OMG Ponies

325,700
82
523
502

If I am correct the exists version you have written here is significantly faster than the in version. With this in mind, is there any argument for the in version? – usumoio Dec 06 '13 at 18:52

The most efficient way to delete all duplicate rows from table?

2 Answers2