7

Does anyone know why sql server chooses to query the table 'building' twice? Is there any explanation? Can it be done with only one table seek?

This is the code sample:

DECLARE @id1stBuild INT = 1
    ,@number1stBuild INT = 2
    ,@idLastBuild INT = 5
    ,@numberLastBuild INT = 1;
DECLARE @nr TABLE (nr INT);

INSERT @nr
VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);

CREATE TABLE building (
    id INT PRIMARY KEY identity(1, 1)
    ,number INT NOT NULL
    ,idStreet INT NOT NULL
    ,surface INT NOT NULL
    )

INSERT INTO building (number,idStreet,surface)
SELECT bl.b
    ,n.nr
    ,abs(convert(BIGINT, convert(VARBINARY, NEWID()))) % 500
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY n1.nr) b
    FROM @nr n1
    CROSS JOIN @nr n2
    CROSS JOIN @nr n3
    ) bl
CROSS JOIN @nr n

--***** execution plan for the select below
SELECT *
FROM building b
WHERE b.id = @id1stBuild
    AND b.number = @number1stBuild
    OR b.id = @idLastBuild
    AND b.number = @numberLastBuild

DROP TABLE building

The execution plan for this is always the same: Two Clustered Index Seek unified through Merge Join (Concatenation). The rest is less important. Here is the execution plan:

enter image description here

Jørgen R
  • 10,568
  • 7
  • 42
  • 59
Emarian
  • 73
  • 3
  • 2
    Your where clause is missing a bracket. I think that might be the cause of this... `(b.id=@id1stBuild AND b.number=@number1stBuild) OR (b.id=@idLastBuild AND b.number=@numberLastBuild)` because of `OR` – Naveed Butt Jan 19 '15 at 11:39
  • Thank you for the quick reply. I tried it and it is still the same execution plan. – Emarian Jan 19 '15 at 11:44

3 Answers3

6

It's not scanning twice. It is seeking twice.

Your query is semantically the same as the below.

SELECT *
FROM   building b
WHERE  b.id = @id1stBuild
       AND b.number = @number1stBuild
UNION
SELECT *
FROM   building b
WHERE  b.id = @idLastBuild
       AND b.number = @numberLastBuild 

And the execution plan performs two seeks and unions the result.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • I never thought of looking at the `OR` operator as a `UNION`, but it actually makes a lot of sense. +1 – Radu Gheorghiu Jan 19 '15 at 12:03
  • 1
    Is `UNION` not `UNION ALL` and that is *very* important. Think what is the result if `@id1stBuild` is equal to `@idLastBuild` and `@number1stBuild` is equal to `@numberLastBuild`. OR query returns one row, while the UNION ALL returns two rows. – Remus Rusanu Jan 19 '15 at 12:13
  • Thanks for all replies. I am new to execution plans and want to study this subject more. I don't understand if sql goes through the table index twice or only once? What could happens if the table has more than, lets say, 10.000.000 rows. This will be used in a web application and I wish to run as quickly. – Emarian Jan 19 '15 at 12:30
  • @Emarian - The larger the table gets the greater the advantage of performing two seeks rather than scanning the whole thing. – Martin Smith Jan 19 '15 at 12:35
4

why is scanning done twice for the same table?

Is not a scan, is a seek, and that makes all the difference.

Implementing OR as a UNION, and then implementing the UNION via a MERGE JOIN. Is called a 'merge union':

Merge union

Now let’s change the query slightly:

select a from T where b = 1 or c = 3

  |--Stream Aggregate(GROUP BY:([T].[a]))
   |--Merge Join(Concatenation)
        |--Index Seek(OBJECT:([T].[Tb]), SEEK:([T].[b]=(1)) ORDERED FORWARD)
        |--Index Seek(OBJECT:([T].[Tc]), SEEK:([T].[c]=(3)) ORDERED FORWARD)

Instead of the concatenation and sort distinct operators, we now have a merge join (concatenation) and a stream aggregate. What happened? The merge join (concatenation) or “merge union” is not really a join at all. It is implemented by the same iterator as the merge join, but it really performs a union all while preserving the order of the input rows. Finally, we use the stream aggregate to eliminate duplicates. (See this post for more about using stream aggregate to eliminate duplicates.) This plan is generally a better choice since the sort distinct uses memory and could spill data to disk if it runs out of memory while the stream aggregate does not use memory.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
2

You can try the following, which gives only one seek and a slight performance improvement. As @Martin_Smith says what you have coded is the equivalent of a Union

SELECT *
FROM building b
WHERE b.id IN (@id1stBuild , @idLastBuild) 
    AND 
        (
            (b.id = @id1stBuild AND b.number = @number1stBuild) OR 
            (b.id = @idLastBuild AND b.number = @numberLastBuild)
        )
Steve Ford
  • 7,433
  • 19
  • 40
  • 1
    It gives a single seek iterator but that still performs two seeks. [When is a Seek not a Seek?](http://sqlblog.com/blogs/paul_white/archive/2011/02/16/when-is-a-seek-not-a-seek.aspx) – Martin Smith Jan 19 '15 at 12:13
  • @MartinSmith in this case the query plan suggests that the parameters in the IN clause are sorted, then the SEEK predicate uses a range scan where id > start and id < end rather than two seeks. – Steve Ford Jan 19 '15 at 15:27
  • It uses a merge interval to eliminate duplicates. The seek is executed twice. See the "number of executions" in the actual execution plan. – Martin Smith Jan 19 '15 at 15:35
  • 1
    @MartinSmith indeed and I see you have contributed to a number of answers on this subject before for instance http://dba.stackexchange.com/questions/14789/can-you-explain-this-execution-plan – Steve Ford Jan 19 '15 at 15:50