0

To my understinding it should be simple, does colA='X' or ColB='X', if any one of them is true then return the result otherwise not.

TableA
╔════╦═══╦═══╦═══╦═══╗
║ id ║ A ║ B ║ C ║ D ║
╠════╬═══╬═══╬═══╬═══╣
║ 1  ║ x ║ x ║ x ║ x ║
║ 2  ║ x ║ x ║ x ║ x ║
║ 3  ║ x ║ x ║ x ║ x ║
╚════╩═══╩═══╩═══╩═══╝

TableB
╔════╦═══╗
║ id ║ A ║
╠════╬═══╣
║  1 ║ x ║
║  2 ║ x ║
║  3 ║ x ║
╚════╩═══╝

select T2.ID from tableA T1
JOIN TableB T2
ON T1.A=T2.A OR T1.B=T2.A OR T1.C=T2.A OR T1.D=T2.A /* **takes 6+ min** */

This simple join takes 6+ minutes. If I join using only one field, then it is pretty quick. Also if I write 4 different queries, one for each OR and then Union the result, it is very quick also.

My question is, How does SQL OR Query works? Does it work on current row, does all the ORING and move on to the next or does it do the complete OR of the two tables for each OR. Even then it should take long as seperate queries are fast enough. I out think an internal optimiser can optiise this really quick: I am comparing one column from one table to 4 column of second table. It should be easy to optimise. Why is it taking so long? Is there a way to manually optimize this query. Btw I tried changing = to != or Or to AND but it takes the same amount of time.

TheTechGuy
  • 16,560
  • 16
  • 115
  • 136
  • 2
    What sizes are these tables? Are there indexes defined on the join fields? – Oded Sep 20 '12 at 19:38
  • Have you checked the execution plan of the query? Also, I'm in awe of your table formatting. – LittleBobbyTables - Au Revoir Sep 20 '12 at 19:38
  • 4
    I'm highly suspect of the schema design that would require this kind of joining in the first place. Fixing the schema may be your best chance of optimizing your performance. – Joe Stefanelli Sep 20 '12 at 19:39
  • yes, it does not say much. About 90% execution time show in JOIN but it is indifferent weather I use multiple ORs or not. Yes I have two primary IDs with indexes. Indexces is `not defined` on `this field` btw which I am using JOIN ON. – TheTechGuy Sep 20 '12 at 19:40
  • The four separate queries probably all make excellent use of indices, and perhaps of hash-comparisons, whereas the single `OR` query is probably doing full scans. I wonder — what happens if you write `ON T2.A IN (T1.A, T1.B, T1.C, T1.D)`? Depending on what indices you have, I think that *may* perform better. – ruakh Sep 20 '12 at 19:40
  • @JoeStefanelli agree but this is old schema written years ago. I am changing that right now but the question is about how OR works. I can't use this query as it is, coz it is actually going to be my subquery in another query and it already takes 6 min. – TheTechGuy Sep 20 '12 at 19:42
  • @ruakh tried that, it is WORSE – TheTechGuy Sep 20 '12 at 19:45
  • @Oded tables are 50k rows and 400k rows but I do filter the larger table, so it reduced to 20k rows. – TheTechGuy Sep 20 '12 at 19:47
  • 1
    As a last-ditch effort, you could use join hints to force the join type to see if you get any improvement: http://msdn.microsoft.com/en-us/library/ms173815.aspx and http://stackoverflow.com/questions/2446927/sql-server-2008-join-hints – LittleBobbyTables - Au Revoir Sep 20 '12 at 19:48
  • Well, with multiple `OR` clauses like that, you would get the product of the two tables 4 times before that result is filtered by table scans (exacerbated by not having indexes on the join fields) – Oded Sep 20 '12 at 19:49
  • What exactly are you attempting to do? Joins are meant for pointing to a common field between two tables. If you're trying to find matches between two tables, use the WHERE clause. – mElling Sep 20 '12 at 19:38
  • 90% of execution time, or 90% of cost (according to Management Studio)? There's a big difference. – Aaron Bertrand Sep 20 '12 at 19:52
  • operation cost = 89% with no OR, 96% with 3 ORs – TheTechGuy Sep 20 '12 at 19:55
  • @HABO 25 in one and 90+ in the other. But I am pulling only one field though, the primary key of one table. – TheTechGuy Sep 20 '12 at 20:04
  • Try an index on `TableA` on columns `A`, `B`, `C` and `D`, and an index on `TableB` on column `A` and _including_ column `Id`. You might want to see what effect including vs. indexing the `TableA` columns has. – HABO Sep 20 '12 at 20:11
  • How and where do you filter the larger table? Are you sure there isn't a performance issue there? Is it done as part of the problem query where is may affect the choice of index or is it done with a temporary table? If you tried indexing the tables as suggested earlier you should have seen a substantial decrease in the number of read I/Os. Have you checked that? – HABO Sep 25 '12 at 15:02
  • @HABO I am not sure if I can put index on a non unique column. Will explore that. Filtering the larger table does not affect performance. The problem starts only when I put OR in the JOIN – TheTechGuy Sep 25 '12 at 19:45
  • The inquiry re: filtering was an effort to determine how the filtering may interact with the suggestions being offered here. If the filtering causes the DB engine to use a particular index then the solution may be to include additional columns in that index to support the `INNER JOIN` conditions (or create a new index that duplicates an existing index and includes additional columns). Adding another index may not do any good if the optimizer has chosen a different index for perfectly valid reasons. Indexes do add overhead and should not be create without a purpose. – HABO Sep 25 '12 at 20:01
  • @HABO I posted my own answer unless someone comes up with a better one. The problem for now I believe is beyond indexes. There is something with OR that make it slow. – TheTechGuy Sep 25 '12 at 21:03
  • 1
    possible duplicate of [Is having an 'OR' in an INNER JOIN condition a bad idea?](http://stackoverflow.com/questions/5901791/is-having-an-or-in-an-inner-join-condition-a-bad-idea) – Martin Smith Sep 25 '12 at 21:34

2 Answers2

0

When multiple 'OR' conditions are used, all of them will be evaluated, even if there are many and the first one evaluated returns true.

By writing the separate queries and combining the results with unions, you are effectively reducing the work done. This is because rows included in the results for the first query won't need to be evaluated in the other 3 queries, rows from the second query won't need to be evaluated in the 3rd and 4th queries, etc.

PeteGO
  • 5,597
  • 3
  • 39
  • 70
0

So far I have used this solution, it takes 1 min 16 sec to execute. On the other hand the JOIN with OR query is taking 30 min and the result of the two queries is not the same.

select T2.ID from tableA T1
JOIN TableB T2
ON T1.A=T2.A 

UNION

select T2.ID from tableA T1
JOIN TableB T2
ON T1.B=T2.A 

UNION

select T2.ID from tableA T1
JOIN TableB T2
ON T1.C=T2.A 

UNION

select T2.ID from tableA T1
JOIN TableB T2
ON T1.D=T2.A 
TheTechGuy
  • 16,560
  • 16
  • 115
  • 136