5

I have a dual self-join query where the performance is severely degraded when the search values are swapped.

-- 500,000 i/o & 500ms execution
select
  fooA.ID
  , fooB.ID
from
  foo AS fooA
  INNER JOIN bar AS barA ON fooA.barID = barA.barID
  INNER JOIN foo AS fooB ON fooA.fooID = fooB.fooID -- self join
  INNER JOIN bar AS barB ON fooB.barID = barB.barID
where
  barA.value = 'xyz'
  AND barB.value = '60'

-- 5,000 i/o & 5ms execution
select
  fooA.ID
  , fooB.ID
from
  foo AS fooA
  INNER JOIN bar AS barA ON fooA.barID = barA.barID
  INNER JOIN foo AS fooB ON fooA.fooID = fooB.fooID -- self join
  INNER JOIN bar AS barB ON fooB.barID = barB.barID
where
  barA.value = '60'
  AND barB.value = 'xyz'
  • The value "xyz" is listed 150,000 times in the "bar" table.
  • The value "60" is listed 500 times in the "bar" table.
  • The query plans are the same, except that the inner-most loop returns either 150,000 rows or 500 rows depending on which search value is listed first.
  • The searches perform seeks on non-clustered indexes.
  • Statistics were updated on both tables with FULLSCAN.

Why doesn't the SQL query optimizer correctly identify that in both instances the inner-most join of the query plan should be the one with the least amount of rows?

Josh Bond
  • 242
  • 1
  • 5

1 Answers1

3

SQL Server does not recompile a query every time it is executed. It does this once, and it optimized for the exact values it sees the first time. I guess you have an unfortunate plan cached.

Try adding OPTION (RECOMPILE) at the end.

usr
  • 168,620
  • 35
  • 240
  • 369
  • I added this and there was no change. The estimated number of rows from the inner-most join is 2,500 for both query plans, but the actual is widely different. – Josh Bond May 15 '12 at 20:18
  • That is strange. Cardinality estimation on indexes for equality matching is usually very accurate. Do the tables you are seeking on contain very high numbers of rows with very few duplicates? – usr May 15 '12 at 20:24
  • I really fail to see anything at all that could cause trouble here. Can you post screenshots of the two actual execution plans? – usr May 15 '12 at 20:25
  • Image #1 is the "bad" query plan and image #2 is the "good" query plan. http://imgur.com/a/dv1ck – Josh Bond May 15 '12 at 20:31
  • Hm I don't know the reason for this behavior. Potential workaround: Use an indexed view to materialize one of the joins. This will help a) reduce runtime cost of the join to nearly nothing b) help rowcount estimation (you need to create statistics on the view manually). Indexed views are a performance sledge-hammer for queries where they apply. – usr May 15 '12 at 20:37
  • Still the same problem. I created the view joining "foo" & "bar", then the index with (value, barID, fooID) and then a query than joined the two views. In the execution plan the inner-most query shows the view but the row count is still inflated for the "bad" query. – Josh Bond May 15 '12 at 22:16