44

I have a four tables, TopLevelParent, two mid level tables MidParentA and MidParentB, and a Child table which can have a parent of MidParentA or MidParentB (One or the other midParent must be in place). Both mid level tables have a parent table of TopLevelParent.

The Top Level table look like this:

TopLevelId | Name
--------------------------
1          | name1   
2          | name2   

The MidParent tables look like this:

MidParentAId | TopLevelParentId |           MidParentBId | TopLevelParentId |
------------------------------------       ------------------------------------
1            |        1         |           1            |        1         |
2            |        1         |           2            |        1         |

The Child table look like this:

ChildId | MidParentAId | MidParentBId
--------------------------------
1       |     1        |   NULL
2       |    NULL      |     2

I have used the following left join in a larger stored procedure which is timing out, and it looks like the OR operator on the last left join is the culprit:

SELECT *    
FROM TopLevelParent tlp
LEFT JOIN MidParentA a ON tlp.TopLevelPatientId = a.TopLevelPatientId
LEFT JOIN MidParentB a ON tlp.TopLevelPatientId = b.TopLevelPatientId
LEFT JOIN Child c ON c.ParentAId = a.ParentAId OR c.ParentBId = b.ParentBId

Is there a more performant way to do this join?

Declan McNulty
  • 3,194
  • 6
  • 35
  • 54

5 Answers5

43

Given how little of the query is being exposed; a very rough rule of thumb is to replace an Or with a Union to avoid table scanning.

Select..
LEFT JOIN Child c ON c.ParentAId = a.ParentAId 
union
Select..
left Join Child c ON c.ParentBId = b.ParentBId
u07ch
  • 13,324
  • 5
  • 42
  • 48
  • 1
    That helps :) You could have two joins to Child rather than the one each one doing one side of the OR; and then use coalesce to get a value from C or D where you only want the one value – u07ch Nov 01 '13 at 11:21
  • 1
    Could someone please explain why is it faster? conceptually Or and Union seem the same to me, so why does Or perform a table scan while Union doesn't? I googled the topic but found nothing. – Arik Dec 05 '22 at 10:36
9

Here is what I did in the end, which got the execution time down from 52 secs to 4 secs.

SELECT * 
FROM (
    SELECT tpl.*, a.MidParentAId as 'MidParentId', 1 as 'IsMidParentA' 
    FROM TopLevelParent tpl 
    INNER JOIN MidParentA  a ON a.TopLevelParentId = tpl.TopLevelParentID
UNION
    SELECT tpl.*, b.MidParentBId as 'MidParentId', 0 as 'IsMidParentA'  
    FROM TopLevelParent tpl 
    INNER JOIN MidParentB b ON b.TopLevelParentId = tpl.TopLevelParentID
UNION
    SELECT tpl.*, 0 as 'MidParentId', 0 as 'IsMidParentA'  
    FROM TopLevelParent tpl 
    WHERE tpl.TopLevelParentID NOT IN (
       SELECT pa.TopLevelParentID 
       FROM TopLevelParent tpl
       INNER JOIN MidParentA  a ON a.TopLevelParentId = tpl.TopLevelParentID
    UNION
       SELECT pa.TopLevelParentID 
       FROM TopLevelParent tpl
       INNER JOIN MidParentB b ON h.TopLevelParentId = tpl.TopLevelParentID
    )
) tpl
LEFT JOIN MidParentA a ON a.TopLevelParentId = tpl.TopLevelParentID
LEFT JOIN MidParentB b ON b.TopLevelParentId = tpl.TopLevelParentID
LEFT JOIN 
(
        SELECT  [ChildId]
                ,[MidParentAId] as 'MidParentId'
                ,1 as 'IsMidParentA'
        FROM Child c
        WHERE c.MidParentAId IS NOT NULL
   UNION
        SELECT [ChildId]
               ,[MidParentBId] as 'MidParentId'
               ,0 as 'IsMidParentA'
        FROM Child c
        WHERE c.MidParentBId IS NOT NULL
) AS c
ON c.MidParentId = tpl.MidParentId  AND c.IsMidParentA = tpl.IsMidParentA

This eliminates the table scanning that was happening, as I have matched the top level record to its midlevel parent up front if it exists, and stamped it on that record.

I have also done the same with the child record meaning I can then just join the child record to the top level record on the MidParentId, and I use the IsMidParentA bit flag to differentiate where there are two identical MidParentIds (ie an Id of 1 for IsMidParentA and IsMidParentB).

Thanks to all who took the time to answer.

Declan McNulty
  • 3,194
  • 6
  • 35
  • 54
7

You should take care of using predicates inside On.

"It is very important to understand that, with outer joins, the ON and WHERE clauses play very different roles, and therefore, they aren’t interchangeable. The WHERE clause still plays a simple filtering role—namely, it keeps true cases and discards false and unknown cases. Use something like this and use predicates in where clause. However, the ON clause doesn’t play a simple filtering role; rather, it’s more a matching role. In other words, a row in the preserved side will be returned whether the ON predicate finds a match for it or not. So the ON predicate only determines which rows from the nonpreserved side get matched to rows from the preserved side—not whether to return the rows from the preserved side." **Exam 70-461: Querying Microsoft SQL Server 2012

Amir Keshavarz
  • 3,050
  • 1
  • 19
  • 26
2

another way to write it:

LEFT JOIN Child c ON c.ParentAId = COALESCE(a.ParentAId, b.ParentBId)

Edit

One possible approach is querying first the MidParentA and then the MidParentB and then UNION the results:

SELECT tlp.*,
       a.MidParentAId,
       null MidParentBId,
       c.ChildId
FROM TopLevelParent tlp
LEFT JOIN MidParentA a ON tlp.TopLevelPatientId = a.TopLevelPatientId
LEFT JOIN Child c ON c.MidParentAId = a.MidParentAId 
UNION
SELECT tlp.*,
       null MidParentAId,
       b.MidParentBId,
       c.ChildId
FROM TopLevelParent tlp
LEFT JOIN MidParentB b ON tlp.TopLevelPatientId = b.TopLevelPatientId
LEFT JOIN Child c ON c.MidParentBId = b.MidParentBId 

A demo in SQLFiddle

mucio
  • 7,014
  • 1
  • 21
  • 33
  • The original query had c.ParentBId = b.ParentBId rather than c.parentAID = b.parentBID so this might not be right – u07ch Nov 01 '13 at 11:00
  • you are right, at first glance I didn't read carefully the question – mucio Nov 01 '13 at 12:13
1

Just to add something for future observers of this answer - sometimes a UNION as described above isn't suitable as the JOIN could be in the middle of a big query that would require lots of replication. This is where an APPLY comes in handy as you could use it without needing to replicate the entire outer query, as it has access to the columns from the outer query. Note: This is in reference to SQL Server only.

SELECT *    
FROM TopLevelParent tlp
LEFT JOIN MidParentA a 
    ON tlp.TopLevelPatientId = a.TopLevelPatientId
LEFT JOIN MidParentB a 
    ON tlp.TopLevelPatientId = b.TopLevelPatientId
OUTER APPLY (
    SELECT * FROM Child WHERE Child.ParentAId = a.ParentAId
    UNION
    SELECT * FROM Child WHERE Child.ParentBId = b.ParentBId
) c
Murphybro2
  • 2,207
  • 1
  • 22
  • 36