SQL Query Performance Degrades Depending on the Order of the Search Values

Question

I have a dual self-join query where the performance is severely degraded when the search values are swapped.

-- 500,000 i/o & 500ms execution
select
  fooA.ID
  , fooB.ID
from
  foo AS fooA
  INNER JOIN bar AS barA ON fooA.barID = barA.barID
  INNER JOIN foo AS fooB ON fooA.fooID = fooB.fooID -- self join
  INNER JOIN bar AS barB ON fooB.barID = barB.barID
where
  barA.value = 'xyz'
  AND barB.value = '60'

-- 5,000 i/o & 5ms execution
select
  fooA.ID
  , fooB.ID
from
  foo AS fooA
  INNER JOIN bar AS barA ON fooA.barID = barA.barID
  INNER JOIN foo AS fooB ON fooA.fooID = fooB.fooID -- self join
  INNER JOIN bar AS barB ON fooB.barID = barB.barID
where
  barA.value = '60'
  AND barB.value = 'xyz'

The value "xyz" is listed 150,000 times in the "bar" table.
The value "60" is listed 500 times in the "bar" table.
The query plans are the same, except that the inner-most loop returns either 150,000 rows or 500 rows depending on which search value is listed first.
The searches perform seeks on non-clustered indexes.
Statistics were updated on both tables with FULLSCAN.

Why doesn't the SQL query optimizer correctly identify that in both instances the inner-most join of the query plan should be the one with the least amount of rows?

Use criteria that limits the number of rows in your query the most always first. Its a general rule of thumb. — JonH, May 15 '12 at 18:36
Possible answer: query parameterization, parameter sniffing & plan reuse. — Bogdan Sahlean, May 15 '12 at 18:52
Can you make the ids primary keys on the foo and bar tables? — Gordon Linoff, May 15 '12 at 19:06
@JonH unfortunately I don't have the option to ensure which value will come first. This query is generated by Hibernate. — Josh Bond, May 15 '12 at 19:54

score 3 · Accepted Answer · answered May 15 '12 at 18:46

3

SQL Server does not recompile a query every time it is executed. It does this once, and it optimized for the exact values it sees the first time. I guess you have an unfortunate plan cached.

Try adding OPTION (RECOMPILE) at the end.

answered May 15 '12 at 18:46

usr

168,620
35
240
369

I added this and there was no change. The estimated number of rows from the inner-most join is 2,500 for both query plans, but the actual is widely different. – Josh Bond May 15 '12 at 20:18
That is strange. Cardinality estimation on indexes for equality matching is usually very accurate. Do the tables you are seeking on contain very high numbers of rows with very few duplicates? – usr May 15 '12 at 20:24
I really fail to see anything at all that could cause trouble here. Can you post screenshots of the two actual execution plans? – usr May 15 '12 at 20:25
Image #1 is the "bad" query plan and image #2 is the "good" query plan. http://imgur.com/a/dv1ck – Josh Bond May 15 '12 at 20:31
Hm I don't know the reason for this behavior. Potential workaround: Use an indexed view to materialize one of the joins. This will help a) reduce runtime cost of the join to nearly nothing b) help rowcount estimation (you need to create statistics on the view manually). Indexed views are a performance sledge-hammer for queries where they apply. – usr May 15 '12 at 20:37
Still the same problem. I created the view joining "foo" & "bar", then the index with (value, barID, fooID) and then a query than joined the two views. In the execution plan the inner-most query shows the view but the row count is still inflated for the "bad" query. – Josh Bond May 15 '12 at 22:16

SQL Query Performance Degrades Depending on the Order of the Search Values

1 Answers1