2

Hi I have below query in an SP

@CrmContactId is a parameter to the SP.

Select distinct A.PolicyBusinessId, A.PolicyDetailId 
from TPolicyBusiness A
inner join TPolicyOwner B on a.PolicyDetailId=b.PolicyDetailId
Left Join TAdditionalOwner C on c.PolicyBusinessId=A.PolicyBusinessId
where (b.CRMContactId = @CRMContactId)  

we made a new change and introduced an OR condition

Select distinct A.PolicyBusinessId, A.PolicyDetailId 
from TPolicyBusiness A
inner join TPolicyOwner B on a.PolicyDetailId=b.PolicyDetailId
Left Join TAdditionalOwner C on c.PolicyBusinessId=A.PolicyBusinessId
where (b.CRMContactId = @CRMContactId OR C.CRMContactId = @CRMContactId) 

Execution plan: enter image description here

Attachment : Execution Plan

but this change caused a huge performance issue in live server. TPolicyBusiness and TPolicyOwner are heavy tables having millions of records. TAdditionalOwner table is a light table with only few records.

To fix this issue Union all was instead of OR condition.

Select distinct A.PolicyBusinessId, A.PolicyDetailId 
        From
        (
            Select A.PolicyBusinessId, A.PolicyDetailId 
            from TPolicyBusiness A
            inner join TPolicyOwner B on a.PolicyDetailId=b.PolicyDetailId
            where b.CRMContactId = @CRMContactId
            union all
            Select A.PolicyBusinessId, A.PolicyDetailId 
            from TPolicyBusiness A
            Join TAdditionalOwner C on c.PolicyBusinessId=A.PolicyBusinessId
            where C.CRMContactId = @CRMContactId
        ) as A

Execution Plan: enter image description here

Attachement Execution Plan

Could someone please explain why the introduction of OR caused an issue and why using Union is better than OR in this case?

Bhaskar
  • 1,680
  • 7
  • 26
  • 40
  • 2
    Did you look at the execution plan ? Does it change the indexes usage, for example ? – Raphaël Althaus Aug 27 '14 at 07:03
  • Without seeing the execution plans and your DDL it is hard to say exactly. My guess would be that the optimiser uses an index scan when using OR, however the narrower cardinality of each query when separated means that when using UNION the optimiser uses an index seek one or both tables in each query. – GarethD Aug 27 '14 at 07:06
  • Ok thanks @RaphaëlAlthaus and GarethD I will have a look into the execution plan and will update the question. – Bhaskar Aug 27 '14 at 07:09
  • I have updated the question with execution plans – Bhaskar Aug 27 '14 at 08:30
  • There is no index covering TAdditionalOwner.PolicyBusinessId. – simo.3792 Aug 27 '14 at 23:57

4 Answers4

9

Using UNION ALL to replace OR is actually one of the well known optimization tricks. The best reference and explanation is in this article: Index Union.

The gist of it is that OR predicates that could be be satisfied by two index seeks cannot be reliably detected by the query optimizer (the reason being impossibility to predict the disjoint sets from the two sides of the OR). So when expressing the same condition as an UNION ALL then the optimizer has no problem creating a plan that does two short seeks and unions the results. The important thing is to realize that a=1 or b=2 can be different from a=1 union all b=2 because the first query returns rows that satisfy both conditions once, while the later returns them twice. When you write the query as UNION ALL you are telling the compiler that you understand that and you have no problem with it.

For further reference see How to analyse SQL Server performance.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
0

Query1, the left join and the where clause combine to mean that table C is effectively ignored (not referenced in select list either), so you basically have a 2 table inner join query.

Query 2, the left join in now almost an inner join because that table is referenced in the where clause without also permitting NULLs from that table - but due the OR, all conditions are being considered for the result. Hence a 3 table inner join query more or less.

Query 3, you have simplified the inner UNION ALL query to 2 simple inner joins

But I suspect there will be more to it because and A.IndigoClientId=@TenantId indicates you have not revealed the full queries anyway.

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
  • sorry I removed that condition and A.IndigoClientId=@TenantId thought that was not useful. Updated my question. – Bhaskar Aug 27 '14 at 07:13
  • too late :) I know it existed now. The basic problem is we can make some educated guesses but that is all we can do. You would need to examine the actual execution plans of each run to really understand the differences. – Paul Maxwell Aug 27 '14 at 07:17
  • Ok :) I will comeback with execution plans. – Bhaskar Aug 27 '14 at 07:23
0

The JOIN to TAdditionalOwner is using

TPolicyBusiness.PolicyBusinessId = TAdditionalOwner.PolicyBusinessId 

where the JOIN to TPolicyOwner is using the

TPolicyBusiness.PolicyDetailId = TPolicyOwner.PolicyDetailId

Check that there is a corresponding index for the PolicyBusinessId.

In the 2-way JOIN, that is part of the UNION, the smaller TAdditionalOwner table will be optimised if there is not an index for it to refer to in TPolicyBusiness, due to the small size. The server will still do a table scan, but use the values from the smaller table and see if they are in the big table somewhere. If there is no index, this optimisation will disappear quite quickly as the small table grows.

Given that you are not referring to either B or C in the SELECT you can simply to this

SELECT DISTINCT A.PolicyBusinessId, A.PolicyDetailId 
FROM TPolicyBusiness A
LEFT JOIN TPolicyOwner B ON a.PolicyDetailId = b.PolicyDetailId AND b.CRMContactId = @CRMContactId 
LEFT JOIN TAdditionalOwner C on c.PolicyBusinessId = A.PolicyBusinessId AND C.CRMContactId = @CRMContactId

This way it will JOIN to either table, same as per your UNION, but without the OUTER select.

Either way ensure that the fields used are indexed.

simo.3792
  • 2,102
  • 1
  • 17
  • 29
-1

Union operator, it will take lot of time if the data is huge.try to avoid union operator.

please try below try to filter the data while joining.it will be much better.

Select distinct A.PolicyBusinessId, A.PolicyDetailId 
from TPolicyBusiness A
inner join TPolicyOwner B on a.PolicyDetailId=b.PolicyDetailId
and b.CRMContactId = @CRMContactId 
Left Join TAdditionalOwner C on c.PolicyBusinessId=A.PolicyBusinessId
where  (1=1 OR C.CRMContactId = @CRMContactId) 
Adi
  • 232
  • 1
  • 9
  • This isn't quite correct. If the contact is from the TAdditionalOwner only, then it won't JOIN to the TPolicyOwner....and therefore no result. – simo.3792 Aug 27 '14 at 07:33