7

I have two somewhat big (4+ million records) tables with identical structure, and they have about 300k duplicated rows. I'd like to DELETE the duplicate rows using the DELETE IN syntax.

I've already done it using the MERGE statement (available only on 2008 or newer, so I can't use it since I'm still running 2005), and the DELETE EXISTS, but I'm running into some trouble getting DELETE IN to work.

The problem that I'm having with DELETE IN is that my big table has a composite primary key, meaning I can only identify unique rows using all those columns together.

Is it possible in T-SQL to have multiple expressions as parameters to the IN clause? Something like:

DELETE FROM MyBigTable
WHERE ([Column1], [Column2], [Column3]) IN
    (SELECT [Column1],
            [Column2],
            [Column3]
     FROM MyBigTable
     INTERSECT
     SELECT [Column1],
            [Column2],
            [Column3]
     FROM MyOtherBigTable)
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
ivanmp
  • 519
  • 1
  • 5
  • 13
  • 5
    This is valid standard SQL syntax (if you replaced those square brackets with double quotes!) but is not yet supported in SQL Server, You can vote [here](http://connect.microsoft.com/SQLServer/feedback/details/299231/add-support-for-ansi-standard-row-value-constructors) for its inclusion. – onedaywhen Mar 12 '12 at 14:44
  • If you want this syntax supported in a future version of SQL Server, please vote & comment: http://connect.microsoft.com/SQLServer/feedback/details/299231/add-support-for-ansi-standard-row-value-constructors – Aaron Bertrand Mar 12 '12 at 22:55

1 Answers1

14

You can just do a JOIN for this:

DELETE A
FROM MyBigTable A
INNER JOIN MyOtherBigTable B
ON A.Column1 = B.Column1 AND A.Column2 = B.Column2 AND A.Column3 = B.Column3
Lamak
  • 69,480
  • 12
  • 108
  • 116