I have a dual self-join query where the performance is severely degraded when the search values are swapped.
-- 500,000 i/o & 500ms execution
select
fooA.ID
, fooB.ID
from
foo AS fooA
INNER JOIN bar AS barA ON fooA.barID = barA.barID
INNER JOIN foo AS fooB ON fooA.fooID = fooB.fooID -- self join
INNER JOIN bar AS barB ON fooB.barID = barB.barID
where
barA.value = 'xyz'
AND barB.value = '60'
-- 5,000 i/o & 5ms execution
select
fooA.ID
, fooB.ID
from
foo AS fooA
INNER JOIN bar AS barA ON fooA.barID = barA.barID
INNER JOIN foo AS fooB ON fooA.fooID = fooB.fooID -- self join
INNER JOIN bar AS barB ON fooB.barID = barB.barID
where
barA.value = '60'
AND barB.value = 'xyz'
- The value "xyz" is listed 150,000 times in the "bar" table.
- The value "60" is listed 500 times in the "bar" table.
- The query plans are the same, except that the inner-most loop returns either 150,000 rows or 500 rows depending on which search value is listed first.
- The searches perform seeks on non-clustered indexes.
- Statistics were updated on both tables with FULLSCAN.
Why doesn't the SQL query optimizer correctly identify that in both instances the inner-most join of the query plan should be the one with the least amount of rows?