SQL Server query: Union vs Distinct union all performance

Question

Does SQL have a difference in performance between these two statements?

SELECT distinct 'A' as TableName, Col1, Col2, Col3 FROM A 
UNION ALL 
SELECT distinct 'B' as TableName, Col1, Col2, Col3 from B

versus

SELECT 'A' as TableName, Col1, Col2, Col3 FROM A 
UNION 
SELECT 'B' as TableName, Col1, Col2, Col3 from B

The difference between this and similar questions such as UNION vs DISTINCT in performance is that I can confirm ahead of time that the individual tables I'm using won't have any duplicate records between them, only within the individual tables.

The execution plans look the same to me, in that it sorts the individual tables before concatenating. However, if I remove the scalar from them both, the plan of the UNION ALL stays basically the same but the UNION changes to concatenating before the distinct. I'll be concatenating about 20 tables together, and it's not clear whether doing 20 individual DISTINCTs is faster than doing one large DISTINCT at the end, since I can still confirm that the tables would not share any duplicates between them (only within the same table).

Put both queries in one session, activate Execution Plan and execute. Compare percents in actual execution plan — DimaSUN, Jan 30 '16 at 00:28
That's something I did, but I only had a few thousand records at the time - not enough to test for a difference in actual total execution time, let alone to adjust the number of rows, number of columns, data types of the columns, etc, to get a general answer. And as I said, the execution plans I ran looked the same. Even if it wasn't, one going down and the other going up doesn't mean anything as far as total execution time. — ChristopherBass, Jan 31 '16 at 14:48

score 2 · Accepted Answer · answered Jan 29 '16 at 20:48

DISTINCT is not necessarily implemented by sort, it can also be implemented by hashing.

Both of these are memory consuming operations and reducing the size of data being distinctified can help reduce the amount of required memory which is good for concurrency.

The algorithmic complexity of sorting is n log n, meaning that the work required grows linearitmically as n grows. On that basis sorting 10 smaller sets of size s should generally be fast than sorting one larger set of size 10*s.

score 0 · Answer 2 · answered Jan 29 '16 at 20:17

Let's not talk about SQL for a minute.

Case 1: Say, there is a list of 100 numbers.

List: 1,2,3,4,....60 and then 61 repeats 40 times.

The list is not arranges and you don't know this before hand. Now you are trying to search the unique values, from the list of 100 numbers and then sort them.

Case 2: As you said, there are two lists with no duplicate records between them.

List 1: 1,2,3,4,....60
List 2: 61,61,61,61... 40 times

It satisfies the condition you mentioned. List one, similarly, has the numbers in random order. But now you are searching the unique values, from the list of 60 and not a larger set of 100 numbers and another list from where you will get 61.

Coming to SQL, it all depends on size of data you have in each individual table, and may be some other factors.

I accept it's not a complete answer, still hope this helps.

SQL Server query: Union vs Distinct union all performance

2 Answers2