I have a query in SQL Server 2008 R2 in the following form:
SELECT TOP (2147483647) *
FROM (
SELECT *
FROM sub_query_a
) hierarchy
LEFT JOIN (
SELECT *
FROM sub_query_b
) expenditure
ON hierarchy.x = expenditure.x AND hierarchy.y = expenditure.y
ORDER BY hierarchy.c, hierarchy.d, hierarchy.e
The hierarchy
subquery contains UNIONS and INNER JOINS. The expenditure
subquery is based on several levels of sub-subqueries, and contains UNIONS, INNER and LEFT JOINS, and ultimately, a PIVOT aggregate.
The hierarchy
subquery by itself runs in 2 seconds and returns 467 rows. The expenditure
subquery by itself runs in 7 seconds and returns 458 rows. Together, without the ORDER BY
clause, the query runs in 11 seconds. However, with the ORDER BY
clause, the query runs in 11 minutes.
The Actual Execution Plan reveals what's different. Without the ORDER BY
clause, both the hierarchy
and expenditure
subqueries are running once each, with the results being Merge Join (Right Outer Join)
joined together. When the ORDER BY
clause is included, the hierarchy
query is still run once, but the expenditure
portion is run once per row from the hierarchy query, and the results are Nested Loops (Left Outer Join)
joined together. Its as if the ORDER BY
clause is causing the expenditure
subquery to become a correlated subquery (which it is not).
To verify that SQL Server was actually capable of doing the query and producing a sorted result set in 11 seconds, as a test, I created a temp table and inserted the results of the query without the ORDER BY
clause into it. Then I did a SELECT * FROM #temp_table ORDER BY c, d, e
. The entire script took the expected 11 seconds, and returned the desired results.
I want to make the query work efficiently with the ORDER BY
clause as one query--I don't want to have to create a stored procedure just to enable the #temp_table hacky solution.
Any ideas on the cause of this issue, or a fix?