JOIN with OR - Query Planner Choosing Nested Loops

Question

I am wondering why does MS SQL Server Query Planner create nested loops rather than choosing a Union for a JOIN with OR Conditions.

Note: From searching on SO, does not seem to be MSSQL specific

E.g.

SELECT * FROM TableA a
JOIN TableB b
ON a.One = b.One 
OR a.Two = b.Two

Is taking 6 minutes in my case (Both One and Two are indexed on both tables)

But

SELECT * FROM TableA a
JOIN TableB b
ON a.One = b.One

UNION -- Not ALL, as need to remove duplicates

SELECT * FROM TableA a
JOIN TableB b
ON a.Two = b.Two

takes 2 seconds.

I know the reason why the first is taking so long (because of the nested loop, where as the 2 unions use the indexes), but I am wondering why does the query planner not choose UNION as the execution plan?

Is there some sort of caveat I need to be aware of when using UNION for which it does not use it?

Why is this logic not implemented into the query planner?

Is it just to keep the code of query planner simpler (As its probably already pretty complex), something that they haven't got round to optimizing or because there is some other caveat that I am not aware of?

@RhysJones is very correct sometimes using indexes will slow you down due to the fact that if using a non-clustered index it still has to search the clustered index for the data. So yes selectivity plays a role here. The other problem might be more mundane but the statistics could be out of date. WHen last did the statistics update? — Namphibian, Feb 24 '15 at 23:05
These queries aren't equivalent. If `TableA` or `TableB` contain duplicates of their *own*, the first query will return them, the second query will not. The optimizer cannot rewrite one to the other in general. — Jeroen Mostert, Feb 24 '15 at 23:15
@jeroen It would just have to use the rid as a dedupifier as it already does in some cases http://blogs.msdn.com/b/craigfr/archive/2006/08/30/732409.aspx — Martin Smith, Feb 24 '15 at 23:25
@MartinSmith: nice find. As that page mentions, these queries really *aren't* equivalent, but the optimizer can do something resembling #2 to achieve #1. — Jeroen Mostert, Feb 24 '15 at 23:30
@Namphibian We have auto update stats on, but we also run UPDATE STATISTICS 100 overnight. (not my choice) Thats another thing I do wonder whether its worth doing as the UPDATE STATS job takes over an hour each night to do, absolutely killing the DISK IO/CPU. — Michal Ciechan, Feb 24 '15 at 23:46
@JeroenMostert Interesting, never though of this possibility (although we do have a Unique Identity column on both tables) hence why I assumed that it will return exactly same results. But interesting none the less — Michal Ciechan, Feb 24 '15 at 23:53
@MartinSmith Thanks for that link! Definitely put the issue Jeroen raised into perspective with detailed info! — Michal Ciechan, Feb 24 '15 at 23:54

score 0 · Answer 1 · answered Feb 24 '15 at 22:47

Is this a selectivity issue? SQL Server likes indexes to be highly selective. One predicate will always be considered more selective than two predicates OR'd and the difference in your case may be the difference between using the indexes or not.

See Bart Duncan's SQL Weblog Query Tuning Fundamentals: Density, Predicates, Selectivity, and Cardinality for a lengthy writeup on selectivity.

JOIN with OR - Query Planner Choosing Nested Loops

1 Answers1