SELECT UNION as DISTINCT

Question

How do I perform a DISTINCT operation on a single column after a UNION is performed?

T1
--
ID Value 
1  1
2  2
3  3

T2
--
ID Value
1  2
4  4
5  5

I am trying to return the table:

ID Value
1  1
2  2
3  3
4  4
5  5

I tried:

SELECT DISTINCT ID, Value 
FROM (SELECT*FROM T1 UNION SELECT*FROM T2) AS T3

This does not seem to work.

You are not giving us all the details, will the value always has to be the same as field 1, min value, max value, random value...Any way distinct is on all the fields, not just one field. — Itay Moav -Malimovka, Jan 09 '12 at 00:50

score 48 · Answer 1 · answered Jan 09 '12 at 01:13

48

Why are you using a sub-query? This will work:

SELECT * FROM T1
UNION
SELECT * FROM T2

UNION removes duplicates. (UNION ALL does not)

answered Jan 09 '12 at 01:13

Bohemian

412,405
93
575
722

1

Point was, OP wanted something called "one-field DISTINCT", and there's no such a concept. – alf Jan 09 '12 at 10:16
1

If you UNION records [1, 1] and [1, 2], you will get both in the result set. OP wanted no repeats from the first column. Obviously this answer was helpful to a lot of people, but I don't think it answers what was asked. – user3750325 Nov 09 '17 at 16:58
@user7733611 Actually, you're right now that I examine OP's example data. This query is the refactored equivalent of OP's query. – Bohemian Nov 09 '17 at 17:24

alf · Accepted Answer · 2016-06-08T06:32:04.093

20

As far as I can say, there's no "one-column distinct": distinct is always applied to a whole record (unless used within an aggregate like count(distinct name)). The reason for this is, SQL cannot guess which values of Value to leave for you—and which to drop. That's something you need to define by yourself.

Try using GROUP BY to ensure ID is not repeated, and any aggregate (here MIN, as in your example it was the minimum that survived) to select a particular value of Value:

SELECT ID, min(Value) FROM (SELECT * FROM T1 UNION ALL SELECT * FROM T2) AS T3
GROUP BY ID

Should be exactly what you need. That is, it's not the same query, and there's no distinct—but it's a query which would return what's shown in the example.

edited Jun 08 '16 at 06:32

answered Jan 09 '12 at 00:40

alf

8,377
24
45

you sure that's the same query? – Mitch Wheat Jan 09 '12 at 00:42
I'd suggest using `UNION ALL` in the subquery as there is no point in doing a `DISTINCT` twice. – Code Magician Jan 09 '12 at 00:44
@MitchWheat I'm sure it's not—but it's a query which would return what's shown in the example. – alf Jan 09 '12 at 00:44
@MitchWheat: It isn't, but it'll do what the OP specifically said he wanted in his "I'm trying to return the table" table. – Amadan Jan 09 '12 at 00:45
On that size data set, I'm not sure it's 100% valid. – Mitch Wheat Jan 09 '12 at 00:46
I just tried alf's code. The Group By function does the trick – user1124535 Jan 09 '12 at 00:48

score 6 · Answer 3 · edited Apr 19 '16 at 08:48

6

I think that's what you meant:

SELECT * 
  FROM T1
UNION
SELECT * 
  FROM T2 
  WHERE (
    **ID
** NOT IN (SELECT ID FROM T1)
  );

edited Apr 19 '16 at 08:48

jherran

3,337
8
37
54

answered Apr 19 '16 at 07:53

KT8

197
3
10

1

I really think this should be the accepted answer to the question. It lets you prioritize which table gets values chosen from instead of doing a MIN() with a GROUP BY. Depends on how OP wanted to choose the Value. – user3750325 Nov 09 '17 at 17:05

score 4 · Answer 4 · answered Jul 31 '14 at 10:31

This - even though this thread is way old - might be a working solution for the question of the OP, even though it might be considered dirty.

We select all tuples from the first table, then adding (union) it with the tuples from the second table limited to those that doe not have the specific field matched in the first table.

SELECT * 
  FROM T1
UNION
SELECT * 
  FROM T2 
  WHERE (
    Value NOT IN (SELECT Value FROM T1)
  );

SELECT UNION as DISTINCT

4 Answers4