9

Please have these two types of query in your mind:

--query1
Select someFields
From someTables
Where someWhereClues
Union all
Select someFields
FROM some Tables
Where someWhereClues

--query2
Select * FROM (
    Select someFields 
    From someTables
    Union all 
    Select someFields
    FROM someTables
    ) DT
Where someMixedWhereClues

Note :
In both queries final result fields are same

I thought the 1st. query is faster or its performance is better!
But after some researches I confused by this note:

SQL Server (as a sample of RDBMS) first reads whole data then seek records. => so in both queries all records will read and seek.

Please Help me on my misunderstandings, and on if there is any other differences between query1 and query2 ?


Edit: adding sample plans:

select t.Name, t.type from sys.tables t where t.type = 'U'
union all
select t.Name, t.type from sys.objects t where t.type = 'U'

select * from (
    select t.Name, t.type from sys.tables t
    union all
    select t.Name, t.type from sys.objects t
    ) dt
where dt.type = 'U'

Execution Plans are: enter image description here enter image description here

both are same and 50%

shA.t
  • 16,580
  • 5
  • 54
  • 111
  • You can check what is happening by running an execution plan on both queries. – xlecoustillier Mar 17 '15 at 12:32
  • If you check execution plans and server statistics (time/IO) for your given data, then you can see which is better. Off the top of my head, the first one looks to be better, but the sql server optimizer could be smart enough to make both equal. Hence - looking at what it runs is the way to analyze. – Allan S. Hansen Mar 17 '15 at 12:32
  • I add execution plans too !!?? – shA.t Mar 17 '15 at 12:54
  • Execution is the same as you defining the same conditions to the WHERE clause. However your question was "Where someMixedWhereClues" – Dmitrij Kultasev Mar 17 '15 at 12:56
  • "someMixedWhereClues" is my mention of using aliases and don't worry about field names ;). – shA.t Mar 17 '15 at 13:00
  • 5
    This basically always boils down to some basic advice: In SQL, you should tell the system *what you want*, not *how to do it* and for most situations, if there are multiple ways to write the same query that is *logically* equivalent, the optimizer *should* produce the same plan for any of the queries. So just write the queries that you think read the best and only think about changing the query if the *measured* performance doesn't meet your requirements. – Damien_The_Unbeliever Apr 07 '15 at 13:50
  • @Damien_The_Unbeliever As your comment, Are Those queries same at all except for my read & write understanding? – shA.t Apr 07 '15 at 13:54
  • 1
    The pairs of queries in your question appear to be *logically* the same. I don't understand your phrasing about "read & write". One thing the optimizer tries to do is to "push" predicates (conditions) as far down as possible. If it can check the predicate as its scanning each basic table (rather than it needing to check something involving multiple tables) then it will perform those checks as it scans the table (or seeks within indexes). – Damien_The_Unbeliever Apr 07 '15 at 13:58
  • @Damien_The_Unbeliever As your comment `So just write the queries that you think read the best` I comment `read & write understanding` ;). My English is not so good, If it is wrong, pardon me ;). – shA.t Apr 07 '15 at 14:05

6 Answers6

5

The SQL Server query optimizer, optimizes both queries so you get nearly the same performance.

Nick N.
  • 12,902
  • 7
  • 57
  • 75
1

The first one cannot be slower. Here is the reasoning:

  • If the WHERE clauses in the first can efficiently use an INDEX, there will be fewer rows to collect together in the UNION. Fewer rows --> faster.
  • The second one does not have an INDEX on the UNION, hence the WHERE cannot be optimized in that way.

Here are things that could lead to the first being slower. But I see them as exceptions, not the rule.

  • A different amount of parallelism is achieved.
  • Different stuff happens to be cached at the time you run the queries.

Caveat: I am assuming all three WHERE clauses are identical (as your example shows).

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • `Fewer rows --> faster` In my mind it was like that but take a look at [this article about how SQL Server executes a Query](www.codeproject.com/Articles/630346/Understanding-how-SQL-Server-executes-a-query), my notes at questions come from there ;). – shA.t Apr 09 '15 at 05:17
  • Long article (and nice). What particular part of it is relevant here? – Rick James Apr 09 '15 at 15:29
1

As a rule of thumb, I will always consider using the first type of the query.

In made-up samples and queries with simple WHERE predicates both will use the same plan. But in a more complex query, with more complicated predicates, the optimizer might not come up with an equally efficient solution for the second type of query (it's just an optimizer, and is bound by resource and time constraints). The more complex the query is, the less chance is the optimizer finds the best execution plan (as it will eventually time-out and choose the least worst plan found so far). And it gets even worse if the predicates are ORed.

dean
  • 9,960
  • 2
  • 25
  • 26
1

SQLServer will optimize both of those queries down to the same thing, as shown in the execution plans you posted. It's able to do this because in this case the queries are fairly simple; in another case it's possible for it to turn out differently. As long as you're composing a query, you should try to follow the same general rules that the optimizer does, and filter as soon as possible to limit the resultset that returns. By telling it that you first want to only get 'U' records, and then combine those results, you will prepare the query for later revisions which could invalidate the optimizer's choices which led to the same execution plan.

In short, you don't have to force simple queries to be optimal, but it's a good habit to have, and it will help when creating more complex queries.

DrewJordan
  • 5,266
  • 1
  • 25
  • 39
0

In my practice 1st option was never slower than the 2nd. I think that optimizer is smart enough to optimize these plans more or less in the same manner. However I made some tests and the 1st option was always better. For example:

CREATE TABLE #a ( a INT, b INT );

WITH Numbers ( I ) AS (
    SELECT 1000

    UNION ALL

    SELECT I + 1
    FROM Numbers
    WHERE I < 5000
)
INSERT INTO #a ( a )
SELECT I
FROM Numbers
ORDER BY CRYPT_GEN_RANDOM(4)
OPTION ( MAXRECURSION 0 );

WITH Numbers ( I ) AS (
    SELECT 1000

    UNION ALL

    SELECT I + 1
    FROM Numbers
    WHERE I < 5000
)
INSERT INTO #a ( b )
SELECT I
FROM Numbers
ORDER BY CRYPT_GEN_RANDOM(4)
OPTION ( MAXRECURSION 0 );

SELECT a, b
FROM #a
WHERE a IS NOT NULL
UNION ALL
SELECT a, b
FROM #a
WHERE b IS NOT NULL

SELECT *
FROM (
    SELECT a, b
    FROM #a
    UNION ALL
    SELECT a, b
    FROM #a
    ) c
WHERE a IS NOT NULL
    OR b IS NOT NULL

The result is 47% vs 53%

shA.t
  • 16,580
  • 5
  • 54
  • 111
Dmitrij Kultasev
  • 5,447
  • 5
  • 44
  • 88
  • As I found the result of `Union All` is better that using `OR` change your queries to `WHERE a IS NOT NULL` clue at all and then check them, And If you checked you results you could see that your first query returns `8002` rows and the second one returns `16004` rows, So those are not same. – shA.t Apr 07 '15 at 03:49
0

In my experience, there is no straightforward answer to this and it varies based on the nature of the underlying query. As you have shown, the optimizer comes up with the same execution plan in both of those scenarios, however that is not always the case. The performance is usually similar, but sometimes the performance can vary drastically depending on the query. In general I only take a closer look at it when performance is bad for no good reason.

Necreaux
  • 9,451
  • 7
  • 26
  • 43