I am attempting to optimize a very large query that has a particularly bad join condition. I have reduced the larger query down to two tables to illustrate the negative behavior I am seeing.
CREATE TABLE #test1 (Id1 INT, Id2 NVARCHAR(256))
CREATE CLUSTERED INDEX PK_test1 ON #test1 (Id1, Id2)
-- Sample (1, "a|b|c|d")
CREATE TABLE #test2 (Id1 INT, Id2 NVARCHAR(256))
CREATE CLUSTERED INDEX PK_test2 ON #test2 (Id1, Id2)
-- Sample (1, "a|b")
The join I am attempting to optimize is a postfix match (query hints added to emulate the join order as these are a part of a much larger query).
DECLARE @id INT = 1
SELECT *
FROM #test1 t1
INNER [HASH] JOIN #test2 t2 ON t1.Id1 = t2.Id1
AND t1.Id2 LIKE t2.Id2 + '%'
-- This condition is the one that is causing issues
-- WHERE t1.Id1 = @id
OPTION (FORCE ORDER)
Without the where clause, the join is able to use the HASH join hint:
However with the where clause the optimizer is unable to build that query instead decides to do this:
Substantially reducing the performance of the query.
As the where condition is reducing the data sets from both tables, I would have expected it to be strictly better, however it is completely changing the seek. It also prevents the join from being a hash or merge join, and it is not clear at all why that would be the case as it should just be reducing the search space for both tables.
The main question here is how can that where condition be having such a massive negative effect to the overall query plan when it shouldn't be changing the index access patterns