32

I am trying to construct a single SQL statement that returns unique, non-null values from multiple columns all located in the same table.

 SELECT distinct tbl_data.code_1 FROM tbl_data
      WHERE tbl_data.code_1 is not null
 UNION
 SELECT tbl_data.code_2 FROM tbl_data
      WHERE tbl_data.code_2 is not null;

For example, tbl_data is as follows:

 id   code_1    code_2
 ---  --------  ----------
 1    AB        BC
 2    BC        
 3    DE        EF
 4              BC

For the above table, the SQL query should return all unique non-null values from the two columns, namely: AB, BC, DE, EF.

I'm fairly new to SQL. My statement above works, but is there a cleaner way to write this SQL statement, since the columns are from the same table?

Scotty.NET
  • 12,533
  • 4
  • 42
  • 51
regulus
  • 939
  • 3
  • 13
  • 21
  • That table structure give me a feeling your DB isn't normalized... – gdoron Jul 02 '12 at 23:39
  • 4
    You don't need the `distinct` in the first query - `union` will do that for you. – Blorgbeard Jul 02 '12 at 23:40
  • @gdoron: The codes correspond to various designations, which may indeed be repeated, i.e. a particular record may have BC and BC for codes 1 and 2. The designation of code 1 versus 2 is also significant. There is a third table look-up table for the various codes. Not the best, but that's what I'm dealing with. – regulus Jul 03 '12 at 00:08
  • 1
    @regulus I updated my answer with the execution plan for your query so you can compare. – Iain Samuel McLean Elder Jul 07 '12 at 21:57

4 Answers4

35

It's better to include code in your question, rather than ambiguous text data, so that we are all working with the same data. Here is the sample schema and data I have assumed:

CREATE TABLE tbl_data (
  id INT NOT NULL,
  code_1 CHAR(2),
  code_2 CHAR(2)
);

INSERT INTO tbl_data (
  id,
  code_1,
  code_2
)
VALUES
  (1, 'AB', 'BC'),
  (2, 'BC', NULL),
  (3, 'DE', 'EF'),
  (4, NULL, 'BC');

As Blorgbeard commented, the DISTINCT clause in your solution is unnecessary because the UNION operator eliminates duplicate rows. There is a UNION ALL operator that does not elimiate duplicates, but it is not appropriate here.

Rewriting your query without the DISTINCT clause is a fine solution to this problem:

SELECT code_1
FROM tbl_data
WHERE code_1 IS NOT NULL
UNION
SELECT code_2
FROM tbl_data
WHERE code_2 IS NOT NULL;

It doesn't matter that the two columns are in the same table. The solution would be the same even if the columns were in different tables.

If you don't like the redundancy of specifying the same filter clause twice, you can encapsulate the union query in a virtual table before filtering that:

SELECT code
FROM (
  SELECT code_1
  FROM tbl_data
  UNION
  SELECT code_2
  FROM tbl_data
) AS DistinctCodes (code)
WHERE code IS NOT NULL;

I find the syntax of the second more ugly, but it is logically neater. But which one performs better?

I created a sqlfiddle that demonstrates that the query optimizer of SQL Server 2005 produces the same execution plan for the two different queries:

The query optimizer produces this execution plan for both queries: two table scans, a concatenation, a distinct sort, and a select.

If SQL Server generates the same execution plan for two queries, then they are practically as well as logically equivalent.

Compare the above to the execution plan for the query in your question:

The DISTINCT clause makes SQL Server 2005 perform a redundant sort operation.

The DISTINCT clause makes SQL Server 2005 perform a redundant sort operation, because the query optimizer does not know that any duplicates filtered out by the DISTINCT in the first query would be filtered out by the UNION later anyway.

This query is logically equivalent to the other two, but the redundant operation makes it less efficient. On a large data set, I would expect your query to take longer to return a result set than the two here. Don't take my word for it; experiment in your own environment to be sure!

Dexygen
  • 12,287
  • 13
  • 80
  • 147
Iain Samuel McLean Elder
  • 19,791
  • 12
  • 64
  • 80
  • One of the Best Answers I think I have ever read on SO. Excellent detail and examples, and test results. – htm11h Oct 26 '16 at 13:54
5

try something like SubQuery:

SELECT derivedtable.NewColumn
FROM
(
    SELECT code_1 as NewColumn FROM tbl_data 
    UNION
    SELECT code_2 as NewColumn FROM tbl_data 
) derivedtable
WHERE derivedtable.NewColumn IS NOT NULL

The UNION already returns DISTINCT values from the combined query.

John Woo
  • 258,903
  • 69
  • 498
  • 492
0

Try this if you have more than two Columns:

CREATE TABLE #temptable (Name1 VARCHAR(25),Name2 VARCHAR(25))

INSERT INTO #temptable(Name1, Name2)
  VALUES('JON', 'Harry'), ('JON', 'JON'), ('Sam','harry')

SELECT t.Name1+','+t.Name2 Names  INTO #t FROM #temptable AS tSELECT DISTINCT ss.value FROM #t AS t
  CROSS APPLY STRING_SPLIT(T.Names,',') AS ss
Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Khati Lx
  • 31
  • 5
0

Union is applied wherever the row data required is similar in terms of type, values etc. It doesnt matter you have column in the same table or the other to retrieve from as the results would remain the same ( in one of the above answers already mentioned though).

As you didn't wanted duplicates theres no point using UNION ALL and use of distinct is simply unnecessary as union gives distinct data

Can create a view would be best choice as view is a virtual representation of the table. Modifications could be then done neatly on that view created

Create VIEW getData AS 
(
  SELECT distinct tbl_data.code_1 
    FROM tbl_data
    WHERE tbl_data.code_1 is not null
  UNION
  SELECT tbl_data.code_2 
    FROM tbl_data
    WHERE tbl_data.code_2 is not null
);
Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Himanshu
  • 3,830
  • 2
  • 10
  • 29