7

I have a table:

| foo | bar |
+-----+-----+
| a   | abc |
| b   | def |
| c   | ghi |
| d   | jkl |
| a   | mno |
| e   | pqr |
| c   | stu |
| f   | vwx |

I want to delete all rows containing duplicates by foo column so that the table should look like this:

| foo | bar |
+-----+-----+
| b   | def |
| d   | jkl |
| e   | pqr |
| f   | vwx |

What is the most efficient way to do this?

Unihedron
  • 10,902
  • 13
  • 62
  • 72

2 Answers2

9

You can join a table from a subquery which returns only unique foo using LEFT JOIN. The rows that did not have a match on the subquery will be deleted as you desired, example

DELETE  a
FROM    TableName a
        LEFT JOIN
        (
            SELECT  foo
            FROM    TableName
            GROUP   BY Foo
            HAVING  COUNT(*) = 1
        ) b ON a.Foo = b.Foo
WHERE   b.Foo IS NULL

For faster performance, add an index on column Foo.

ALTER TABLE tableName ADD INDEX(foo)
John Woo
  • 258,903
  • 69
  • 498
  • 492
8

Using EXISTS:

DELETE a
  FROM TableName a
 WHERE EXISTS (SELECT NULL
                 FROM TableName b
                WHERE b.foo = a.foo
             GROUP BY b.foo
               HAVING COUNT(*) > 1)

Using IN:

DELETE a
  FROM TableName a
 WHERE a.foo IN (SELECT b.foo
                   FROM TableName b
               GROUP BY b.foo
                 HAVING COUNT(*) > 1)
OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
  • If I am correct the exists version you have written here is significantly faster than the in version. With this in mind, is there any argument for the in version? – usumoio Dec 06 '13 at 18:52