19

How do I perform a DISTINCT operation on a single column after a UNION is performed?

T1
--
ID Value 
1  1
2  2
3  3

T2
--
ID Value
1  2
4  4
5  5


I am trying to return the table:

ID Value
1  1
2  2
3  3
4  4
5  5

I tried:

SELECT DISTINCT ID, Value 
FROM (SELECT*FROM T1 UNION SELECT*FROM T2) AS T3

This does not seem to work.

Code Magician
  • 23,217
  • 7
  • 60
  • 77
user1124535
  • 765
  • 3
  • 9
  • 15

4 Answers4

48

Why are you using a sub-query? This will work:

SELECT * FROM T1
UNION
SELECT * FROM T2

UNION removes duplicates. (UNION ALL does not)

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • 1
    Point was, OP wanted something called "one-field DISTINCT", and there's no such a concept. – alf Jan 09 '12 at 10:16
  • 1
    If you UNION records [1, 1] and [1, 2], you will get both in the result set. OP wanted no repeats from the first column. Obviously this answer was helpful to a lot of people, but I don't think it answers what was asked. – user3750325 Nov 09 '17 at 16:58
  • @user7733611 Actually, you're right now that I examine OP's example data. This query is the refactored equivalent of OP's query. – Bohemian Nov 09 '17 at 17:24
20

As far as I can say, there's no "one-column distinct": distinct is always applied to a whole record (unless used within an aggregate like count(distinct name)). The reason for this is, SQL cannot guess which values of Value to leave for you—and which to drop. That's something you need to define by yourself.

Try using GROUP BY to ensure ID is not repeated, and any aggregate (here MIN, as in your example it was the minimum that survived) to select a particular value of Value:

SELECT ID, min(Value) FROM (SELECT * FROM T1 UNION ALL SELECT * FROM T2) AS T3
GROUP BY ID

Should be exactly what you need. That is, it's not the same query, and there's no distinct—but it's a query which would return what's shown in the example.

alf
  • 8,377
  • 24
  • 45
6

I think that's what you meant:

SELECT * 
  FROM T1
UNION
SELECT * 
  FROM T2 
  WHERE (
    **ID
** NOT IN (SELECT ID FROM T1)
  );
jherran
  • 3,337
  • 8
  • 37
  • 54
KT8
  • 197
  • 3
  • 10
  • 1
    I really think this should be the accepted answer to the question. It lets you prioritize which table gets values chosen from instead of doing a MIN() with a GROUP BY. Depends on how OP wanted to choose the Value. – user3750325 Nov 09 '17 at 17:05
4

This - even though this thread is way old - might be a working solution for the question of the OP, even though it might be considered dirty.

We select all tuples from the first table, then adding (union) it with the tuples from the second table limited to those that doe not have the specific field matched in the first table.

SELECT * 
  FROM T1
UNION
SELECT * 
  FROM T2 
  WHERE (
    Value NOT IN (SELECT Value FROM T1)
  );
liselorev
  • 73
  • 4