SELECT DISTINCT With WHERE Clause is VERY SLOW vs. SELECT DISTINCT with INNER JOIN AND Clause

Question

We have a production stored procedure that executes in 2-3 seconds everywhere except one client's environment.

Their environment appears healthy with 24 cores, 64 GB RAM and is nowhere near capacity. The SQL Server is 2008 r2 SP2.

I have restored the DB in my testing environment and the sproc returns in 2 seconds, but takes 20-50 minutes in the client's environment.

Today I setup a new instance of SQL Server on the same server and it also takes 20-50 minutes to execute the stored procedure.

Our DBA found the problem statement and devised a workaround.

Original:

--This returns in 30 minutes.
SELECT DISTINCT
          P.ProjectID, P.ProjectName
FROM DocumentRoute DR

LEFT JOIN Routes R
ON R.RouteID = DR.RouteID

INNER JOIN Documents D
ON D.DocumentID = DR.DocumentID
AND D.Status = 1

INNER JOIN Files F
ON F.FileID = D.FileID
AND F.Status = 1

INNER JOIN Projects P
ON P.ProjectID = F.ProjectID
AND P.Status = 1

LEFT OUTER JOIN Users U
ON U.UserID = DR.UserID

WHERE DR.Status = 1

Original Execution Plan here: http://screencast.com/t/xGcRIE9o

Workaround:

--This returns in 2 seconds.
SELECT DISTINCT
          P.ProjectID, P.ProjectName
FROM DocumentRoute DR

LEFT JOIN Routes R
ON R.RouteID = DR.RouteID
AND DR.Status = 1

INNER JOIN Documents D
ON D.DocumentID = DR.DocumentID
AND D.Status = 1

INNER JOIN Files F
ON F.FileID = D.FileID
AND F.Status = 1

INNER JOIN Projects P
ON P.ProjectID = F.ProjectID
AND P.Status = 1

LEFT OUTER JOIN Users U
ON U.UserID = DR.UserID

Revised Execution Plan here: http://screencast.com/t/Fqg90w6NDyZd

What in the client's environment could possibly account for the massive difference in execution time?

Additional info: When I got the execution plans from the client, the problem statement by itself finished in 17 seconds, but the entire sproc has been running for 15+ minutes and will probably take another 15 to finish.

Assuming you viewed the execution plan, do you detect any invalid index usages? Dynamic statistics could cause the optimizer to elect an unexpected (or no) index. Use of CTE's have helped me in the past as it forces execution. Beware, any efforts to sidestep the optimizer may leave future upgrades vulnerable to problems. — barrypicker, Feb 28 '14 at 22:40
@barrypicker What does "use of CTE's forces execution" mean? — ErikE, Feb 28 '14 at 22:44
The DBA did not seem concerned by the execution plan, but since we have gone this far, I may compare the client exec plan vs my exec plan. I actually added some indexes to get the execution time under 30 seconds in the client ENV, this worked for a week, then since Monday it has taken 30 minutes to return. — unhappyCrackers1, Feb 28 '14 at 22:45
@ErikE - if using a CTE as part of the query, the execution of the CTE will have to occur first. Once the results for the CTE have been fetched, they bind to the query using the CTE. This is a much different approach from the optimizer perspective than the query above. — barrypicker, Mar 03 '14 at 01:35
Hmmm, your original execution plan shows 138%. That doesn't make any sense... starting from upper left to lower right... 0 0 1 0 0 0 1 86 50 — barrypicker, Mar 03 '14 at 01:41
@barrypicker That is completely incorrect. A CTE is nothing more than a derived table. Execution order is not forced. Predicate push down occurs. Your information is sadly mistaken. — ErikE, Mar 03 '14 at 06:03
@ErikE - your comment - Pedicate push-down occurs is correct. But a CTE is NOT a derived table - it allows functions not possible on a derived table, such as recursion. My comment regarding forcing execution is loose, I only meant this logically, not technically. Using a CTE is just a possible option to have the optimizer behave differently. When constructing basic queries, two nearly identical, but one with a CTE, I find the optimizer chooses a drastically different execution plan. This can be used to the advantage of a SQL Developer. — barrypicker, Mar 03 '14 at 20:30
@barrypicker I have no problem with the idea that a CTE may use a different execution plan. There is controversy over what is a derived table and what is not--so I'm fine calling both CTEs and derived tables *table expressions*. I have to stay firm, though, that given predicate push down and join reordering, a CTE does NOT have to execute first. This example shows such a "CTE pre-execution" concept as false: `WITH A AS(SELECT R=NewID())SELECT R FROM A CROSS JOIN(SELECT 1 UNION ALL SELECT 2)X(B);`. It returns two different values, not a single value twice. — ErikE, Mar 03 '14 at 22:32

ErikE · Answer 1 · 2014-03-03T20:18:22.960

The first problem is that these two queries do not have the same meaning and can return different results. The first query will only produce DocumentRoute rows where Status = 1. The second query will produce all DocumentRoute rows, and where the Status is null or is not equal to 1, will not perform the join to Routes.

The second problem is that if you are just selecting columns from the Projects table using DISTINCT, the LEFT JOINs can't possibly change the query in any way--so you may as well remove them.

Finally, without giving us some idea of the execution plan for both queries, and possibly some more details about the structure of the tables involved, no one is going to be able to definitively give you an answer about what is going on. Furthermore, the execution plan within your environment (where it takes 2 seconds) is not going to be helpful. We have to know the execution plan in the environment where it is running slowly.

To get an execution plan, run SET STATISTICS XML ON; first, run the desired query, and see the execution plan shown in a result set given after the result set of the query itself.

Some thoughts on what could be causing the problem:

Are statistics set to update automatically? If not, the server could choose a poor plan.
What is the fragmentation of the tables involved in the query?
Does the client database server have its tempdb files set up properly--if there is more than one, are they all the same size?

We setup the TempDB anew on a separate drive with 4 data files 1GB each, we rebuilt the indexes (0% fragmentation) and AutoUpdateStatistics = True. — unhappyCrackers1, Feb 28 '14 at 23:18
Those sound good, so now we need to look at the execution plans. — ErikE, Mar 01 '14 at 02:12

score 1 · Accepted Answer · answered Mar 03 '14 at 20:19

With the advice of @ErikE, I scrutinized the client's execution plans for the multiple statements within the stored procedure and I noticed several parallelism operators.

So knowing that they had 24 processors on the server vs. 1 on mine, the light bulb lit up and I decided to try MAXDOP of 1 and behold, the execution time went from 45 minutes to 2 seconds.

sp_configure 'show advanced options', 1;
RECONFIGURE WITH OVERRIDE;
GO

sp_configure 'max degree of parallelism', 1;
GO
RECONFIGURE WITH OVERRIDE

here is a link describing max degree of parallelism issues... http://sqlblog.com/blogs/paul_white/archive/2012/05/03/parallel-row-goals-gone-rogue.aspx — barrypicker, Mar 03 '14 at 23:10

SELECT DISTINCT With WHERE Clause is VERY SLOW vs. SELECT DISTINCT with INNER JOIN AND Clause

2 Answers2