4

I saw the following posted on a basic way to de-dup entries, without explanation of how it works. I see that it works, but I want to know the workings of how it works and the process in which it evaluates. Below I will post the code, and my thoughts. I am hoping that somebody can tell me if my thought process on how this is evaluated step by step is correct, or if I am off, can somebody please break it down for me.

CREATE TABLE #DuplicateRcordTable (Col1 INT, Col2 INT)
INSERT INTO #DuplicateRcordTable
SELECT 1, 1
UNION ALL
SELECT 1, 1
UNION ALL
SELECT 1, 1
UNION ALL
SELECT 1, 2
UNION ALL
SELECT 1, 2 
UNION ALL
SELECT 1, 3
UNION ALL
SELECT 1, 4
GO

This returns a basic table:

Table1

Then this code is used to exclude duplicates:

SELECT col1,col2
FROM #DuplicateRcordTable
EXCEPT
SELECT col1,col2
FROM #DuplicateRcordTable WHERE 1=0

My understanding is that where 1=0 creates a "temp" table structured the same but has no data.

Does this code then start adding data to the new empty table?

For example does it look at the first Col1, Col2 pair of 1,1 and say "I don't see it in the table" so it adds it to the "temp" table and end result, then checks the next row which is also 1,1 and then sees it already in the "temp" table so its not added to the end result....and so on through the data.

Dale K
  • 25,246
  • 15
  • 42
  • 71
Nick A
  • 126
  • 7
  • group by is another way (aside from Gordon's method of DISTINCT which is more clear) https://sqlperformance.com/2017/01/t-sql-queries/surprises-assumptions-group-by-distinct – S3S Dec 31 '18 at 20:57

2 Answers2

4

EXCEPT is a set operation that removes duplicates. That is, it takes everything in the first table that is not in the second and then does duplicate removal.

With an empty second set, all that is left is the duplicate removal.

Hence,

SELECT col1, col2
FROM #DuplicateRcordTable
EXCEPT
SELECT col1, col2
FROM #DuplicateRcordTable
WHERE 1 = 0;

is equivalent to:

SELECT DISTINCT col1, col2
FROM #DuplicateRcordTable

This would be the more typical way to write the query.

This would also be equivalent to:

SELECT col1,col2
FROM #DuplicateRcordTable
UNION
SELECT col1,col2
FROM #DuplicateRcordTable
WHERE 1 = 0;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • But that doesn't explain how the `1 = 0` takes effect? – Dale K Dec 31 '18 at 20:48
  • 1
    @DaleBurrell . . . 1 = 0 removes all rows from the second query, hence the empty set. – Gordon Linoff Dec 31 '18 at 20:51
  • @GordonLinoff Is it ever necessary to use the Except function using the same table twice? I understand how it works with separate tables, but if you can just use distinct, then why use it on the same table. Also why is the WHERE 1=0 necessary? Thank you! – Nick A Dec 31 '18 at 21:07
  • 1
    If you use DISTINCT then no @NickA but otherwise you’d have to do something to return an empty result set (if you use except), like select null, null or use the 1=0 method... or you could just remove the WHERE clause but this would be slower. Both are ambiguous ways to achieve DISTINCT. Another method is GROUP BY that I posted in the main comments – S3S Dec 31 '18 at 21:16
  • @NickA . . . I don't know what your question is. There may be times where `except` is useful on the same table. There is no reason to use `except` when your intention is to remove duplicates. There is simpler syntax for that purpose. – Gordon Linoff Dec 31 '18 at 22:36
1

The reason that this works is due to the definition of EXCEPT which according to the MS docs is

EXCEPT returns distinct rows from the left input query that aren't output by the right input query.

The key word here being distinct. Putting where 1 = 0 makes the second query return no results, but the EXCEPT operator itself then reduces the rows from the left query down to those which are distinct.

As @Gordon Linoff says in his answer, there is a simpler, more straightforward way to accomplish this.

The fact that the example uses the same table in the left and right queries could be misleading, the following query will accomplish the same thing, so long as the values in the right query don't exist in the left:

SELECT col1, col2
FROM @DuplicateRecordTable
EXCEPT
SELECT -1, -1

REF: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect-transact-sql?view=sql-server-2017

Dale K
  • 25,246
  • 15
  • 42
  • 71
  • I think I'm missing something in how your answer differs from Gordon's. It is the last day of the year, so my mind ins't quite as sharp as usual. – S3S Dec 31 '18 at 21:00
  • @scsimon - the OP was specifically interested in why the `1=0` clause was used, which I felt got lost in the overall reply from Gordon. The question wasn't trying to find ways to remove duplicates, rather to understand why this method works. – Dale K Dec 31 '18 at 21:02
  • Ah ok. I thought his second and third sentence was for that but more explanation never hurts. – S3S Dec 31 '18 at 21:03