1

I have one illogical problem that I just can't figure out.

I am doing a complex query. After I did a little change it began executing over 2 minutes instead of one second. Can someone explain to me how is this even possible? What could be the background of this?

First query

DECLARE @CRUISE_ID int = 10001890 --:CRUISE_ID
SELECT
/* ... */
FROM Cruise_Itinerary with(nolock)
INNER JOIN Cruise with(nolock) ON Cruise_Itinerary.CRUISE_ID = Cruise.CRUISE_ID 
  AND (Cruise.CRUISE_ID = @CRUISE_ID) 
/* ... */

Second query

DECLARE @CRUISE_ID int = 10001890 --:CRUISE_ID
SELECT
/* ... */
FROM Cruise_Itinerary with(nolock)
INNER JOIN Cruise with(nolock) ON Cruise_Itinerary.CRUISE_ID = Cruise.CRUISE_ID 
  AND (@CRUISE_ID is null OR Cruise.CRUISE_ID = @CRUISE_ID) 
/* ... */

The first query executes in one second but the second one takes over 2 minutes to execute. I just don't get it. What is a difference between

AND (10001890 is null OR Cruise.CRUISE_ID = 10001890)

and

AND (@CRUISE_ID is null OR Cruise.CRUISE_ID = @CRUISE_ID)?

Variable @CRUISE_ID has no other occurrences is the entire query.


Edit: I figured it out with help of my colleagues and you guys.

Here is a good explanation what is going on: http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/

The optimal plan differs completely depending on what parameters are passed. The optimizer can't tell that and it plays safe. It creates plans that will always work. That’s (one of the reasons) why in the first example it was an index scan, not an index seek.

We can see it from the execution plan of the second query that the index scan happened at the end of plan. I checked. It takes over 2 minutes to execute if I remove this the whole condition. Execution plan for the second query

drinovc
  • 521
  • 5
  • 16
  • why would you ask if "10001890 is null"? (or if @CRUISE_ID is null [since it's always declared]?) – BryanOfEarth Dec 22 '15 at 16:13
  • Because it's a parameter in the real query - `DECLARE @CRUISE_ID int = :CRUISE_ID`. This one is just a test case. – drinovc Dec 22 '15 at 16:16
  • Do you get the same results when you declare the variable right before the query (as shown in your question) than when you are using the parameter? – Siyual Dec 22 '15 at 16:20
  • @Siyual yes. It takes the same time using the parameter as it takes if I execute it with fixed value in Microsoft SQL Server Management Studio. – drinovc Dec 22 '15 at 16:35
  • This can happen if you have stale histogram information. If histogram information is set to 100% but not up to date, a new value not present in the index can trigger a full table scan. Check if the index histogram information is stale. – Norbert Dec 22 '15 at 16:37
  • @NorbertvanNobelen how do I know if index histogram is stale? I ran `DBCC SHOW_STATISTICS(Cruise, IX_Cruise_ID) WITH HISTOGRAM;` but I don't know what exactly should I look for. – drinovc Dec 22 '15 at 16:48
  • 1
    Have you checked the query plans of the different queries? – Tom H Dec 22 '15 at 17:10
  • Open the table properties and check if statistics update is AUTO. You can also rerun the the statistics update just to see if it solves the issue (so then you know the cause & solution direction) – Norbert Dec 22 '15 at 17:12
  • I needed to remove most parts of query to get the smallest working example. It was small enough then to see that query plans are different. This has to be it. Something is crushing performance. My guess is that it's hiding in one of the views that I'm using. – drinovc Dec 23 '15 at 09:48
  • I found a similar article: http://stackoverflow.com/questions/8992534/why-are-the-performances-of-these-2-queries-so-different – drinovc Dec 23 '15 at 09:49
  • 1
    http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/ – Martin Smith Dec 23 '15 at 10:48
  • Thank you @MartinSmith for this great article on this issue. I will recommend it to others too. – drinovc Dec 23 '15 at 11:02

1 Answers1

2

Firstly, the logic in your query seems contradictory. You are essentially saying "If x and (x or y)". We (humans) might think along the lines of:

Given that x (Cruise.CRUISE_ID = @CRUISE_ID) in this instance must be true to meet the AND logic, the second condition (@CRUISE_ID is null OR Cruise.CRUISE_ID = @CRUISE_ID) can be ignored. So ensure that the x is true as the starting point for one's calculations.

The SQL query optimiser however clearly decides that the query plan must try to ensure that both sides of the AND must be met and thus rationalises it something along the lines of:

With just condition 1 the plan can start by performing a (clustered?) INDEX SEEK on the Cruise table on the basis of the (presumably indexed) CruiseID. When you add in condition 3 the optimiser can no longer perform this seek as another predicate (@CruiseID is null) must be taken account of (@CRUISE_ID is null OR Cruise.CRUISE_ID = @CRUISE_ID). Therefore the whole of the Cruise_Itinerary table has to be scanned (there are no other indexed columns it can use), then the join onto Cruise performed before the various conditions are checked as part of the join.

Essentially it is doing what you are asking - if the value is NULL then everything must be returned with predictably devastating consequences for performance. You would be better off using an IF...ELSE block to ensure that the query plan is optimised for both possible options (@CruiseID is null/ is not null).

strickt01
  • 3,959
  • 1
  • 17
  • 32
  • Hi. I guess I was not clear enough. I am using only one condition at time. But your answer is the most accurate to what is actually happening. I always thought that query optimizer will know the value of `@CruiseID is null` and will treat it like a constant value. But I guess not. It's not that smart. – drinovc Dec 23 '15 at 10:56