What's the best way to union two queries with distinct values in a single column, prioritizing the first query?

Question

Needs to be database-agnostic between Oracle and SQL server, although I wouldn't mind hearing SQL server-specific examples as well.

I'm sure the title isn't clear at all, so let me explain what I'm thinking. I'm thinking of two queries. The first might pull in a bunch of data from a given table, including primary keys. The second would just pull in every primary key and leave all other columns blank.

Then I'd want to union them together in such a way that whenever a primary key is missing in the first query, the row from the second query gets pulled in. Otherwise, if the primary key exists in the first query, the row from the second query is ignored.

Quick example:

First query pulls in two columns (first is primary key):

1   1

2   1

Second query pulls in :

1   NULL

2   NULL

3   NULL

So I would want the whole query to pull in:

1  1

2  1

3  NULL

What's the best way to pull this off, performance-wise? Consider an example where there might be a very large number of rows and columns, and the first query might be pretty performance-intensive (although the second of course should always be straightforward, just pulling in primary keys from a list and filling the rest of the columns out with either NULLs or static values).

SSIS Lookup transformation is probably a good approach. It can compare data sources and select data based on condition. SSIS Oracle Attunity is really fast to load data to or from Oracle but it requires Enterprise SQL edition. It is going to be very slow via linked server. — sqlint, Sep 24 '13 at 22:52
Related: [SQL UNION ALL to eliminate duplicates](https://stackoverflow.com/questions/41729082/sql-union-all-to-eliminate-duplicates) — Vadzim, Apr 27 '20 at 20:41

score 2 · Answer 1 · answered Sep 24 '13 at 22:40

It sounds to me that you want to use a FULL OUTER JOIN on the two tables or queries:

select 
  coalesce(q1.col1, q2.col1) col1,
  coalesce(q1.col2, q2.col2) col2
from query1 q1
full outer join query2 q2
  on q1.col1 = q2.col1;

See SQL Fiddle with Demo.

This will join the two queries on your primary key column (col1 in the sample query), then you can use COALESCE on the columns to return the first non-null value for col1, col2, etc.

score 1 · Answer 2 · answered Sep 24 '13 at 22:40

1

You can't use a union since SQL will consider 1, 2 and 1, NULL to be distinct.

Not knowing your schema, I would try the following in psuedo code:

select *
from query_1
union all
select primary_key
from query_2
where query_2.PK not in(select PK from query_1)

This will only return the primary keys in query_2 that are not in query_1 and get you a clean union where the query_1 results are prioritized over query_2 results. Selecting just the primary keys for the first query should be quick and easy, but if that isn't the case let me know and I can try to come up with a more complicated query given your schema.

answered Sep 24 '13 at 22:40

Vulcronos

3,428
3
16
24

2

I like the approach, but "not in" tends to be slow. Both sql server and oracle support the minus keyword so that part could be where query_2.pk in (select pk from query_2 minus select pk from query_1) – Dan Bracuk Sep 24 '13 at 22:52
@DanBracuk That is true. Not sure if that would run faster with his data but it has a chance. However, MINUS is Oracle only, it is EXCEPT in SQL Server. – Vulcronos Sep 24 '13 at 22:54
So I thought of this, but like I said above, query 1 might be very performance-intensive, and now we're running it twice PLUS doing the not in. – J S Sep 25 '13 at 14:02
@JS I was hoping just selecting the primary keys would be nice and quick. If not, try bluefeet's solution. If that doesn't work let us know and we can consider a common table expression or a temp table. That would let you cache the results of query_1 and resuse the data from the temp table for the not in instead of running it twice. – Vulcronos Sep 25 '13 at 14:11

What's the best way to union two queries with distinct values in a single column, prioritizing the first query?

2 Answers2