0

My understanding was that most OR conditions can be replaced where performance benefits with UNION ALL. But for these 2 queries the Count ('1') is not the same. Why is it so- am I missing something here. Should it not be the same . Can someone explain the disparity

SQL # 1

sel 
D1.COL_1_CD, D1.COL_1_DESC,
        D2.COL_2_CD, D2.COL_2_DESC, D3.COL_3_CD, D3.COL_3_DESC, D4.COL_4_CD,
        D4.COL_4_DESC, D5.COL_5_CD, D5.COL_5_DESC,
        d1.COL_1_CD_SYS_ID,
        d2.COL_2_CD_SYS_ID,
        d3.COL_3_CD_SYS_ID,
        d4.COL_4_CD_SYS_ID,
        d5.COL_5_CD_SYS_ID


from
 D1,
 D2,  D3,  D4,
         D5

        where
        D1.COL_1_CD1=D2.COL_2_CD1  
    and
        D2.COL_2_CD1=D3.COL_3_CD1 
    and 
        D4.COL_4_CD1=D5.COL_5_CD1 
        and 


 (D1.COL_1_CD in ('707') 
    or   D2.COL_2_CD in ('707') 
    or  D3.COL_3_CD in ('707') 
    or  D4.COL_4_CD in ('707') 
    or   D5.COL_5_CD in ('707') ) 

    group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

SQL # 2

sel 
D1.COL_1_CD, D1.COL_1_DESC,
        D2.COL_2_CD, D2.COL_2_DESC, D3.COL_3_CD, D3.COL_3_DESC, D4.COL_4_CD,
        D4.COL_4_DESC, D5.COL_5_CD, D5.COL_5_DESC,
        d1.COL_1_CD_SYS_ID,
        d2.COL_2_CD_SYS_ID,
        d3.COL_3_CD_SYS_ID,
        d4.COL_4_CD_SYS_ID,
        d5.COL_5_CD_SYS_ID


from
 D1,
 D2,  D3,  D4,
         D5

        where
        D1.COL_1_CD1=D2.COL_2_CD1  
    and
        D2.COL_2_CD1=D3.COL_3_CD1 
    and 
        D4.COL_4_CD1=D5.COL_5_CD1 
        and 


 D1.COL_1_CD in ('707') 

UNION ALL 

sel 
D1.COL_1_CD, D1.COL_1_DESC,
        D2.COL_2_CD, D2.COL_2_DESC, D3.COL_3_CD, D3.COL_3_DESC, D4.COL_4_CD,
        D4.COL_4_DESC, D5.COL_5_CD, D5.COL_5_DESC,
        d1.COL_1_CD_SYS_ID,
        d2.COL_2_CD_SYS_ID,
        d3.COL_3_CD_SYS_ID,
        d4.COL_4_CD_SYS_ID,
        d5.COL_5_CD_SYS_ID


from
 D1,
 D2,  D3,  D4,
         D5

        where
        D1.COL_1_CD1=D2.COL_2_CD1  
    and
        D2.COL_2_CD1=D3.COL_3_CD1 
    and 
        D4.COL_4_CD1=D5.COL_5_CD1 
        and 


   D2.COL_2_CD in ('707')  


UNION ALL 
.....<same query>
D3.COL_3_CD in ('707')

UNION ALL 

.....<same query>
D4.COL_4_CD in ('707')

UNION ALL 
.....<same query>
D5.COL_5_CD in ('707')

row counts are not same . What kind of OR logic can be converted as equivalent UNION ALL.

GregGalloway
  • 11,355
  • 3
  • 16
  • 47
user1874594
  • 2,277
  • 1
  • 25
  • 49
  • 1
    ***Can someone explain the disparity*** Assume data is D1.COL_1_CD of 707 and D2.COL_2_CD of 707 on same record in query 1. In a join only 1 record would be returned. In a UNION ALL 2 records will be returned. as the ALL prevents the distinct statement from executing Are you sure you want `union all` or just a `union` which performs a distinct (which is a performance hit but may still be faster than the or) – xQbert Oct 01 '15 at 20:40
  • Also 'in' can be slow . are they = instead of in? or what about inverting the order of your in statement... `'707' in (D1.Col_1_CD,D2.COL_2_CD, D3.COL_3_CD, D4.COL_4_CD,D5.COL_5_CD)` instead of all your OR's... on the first query. – xQbert Oct 01 '15 at 20:49
  • 'in' is equivalent to '=', no need to perpetuate that bizarre understanding. xQbert does explain the disparity properly though – Adam Martin Oct 01 '15 at 20:55
  • 707' in (D1.Col_1_CD,D2.COL_2_CD, D3.COL_3_CD, D4.COL_4_CD,D5.COL_5_CD) gives me syntax error. I would have LOVED to use it if it worked – user1874594 Oct 01 '15 at 21:10
  • 1
    `OR` can only be rewritten as `UNION` not `UNION ALL`. The only reason to do this might be an ORed join condition which result in a huge product join (which is always a indication of a strange data model). In your case it's just a `WHERE`-condition, should be much faster without UNION. Teradata doesn't support `'707' IN (cola,col)`, I don't know why, it's annoying... – dnoeth Oct 01 '15 at 21:30
  • yes. The larger query that contained this situation was optimimized using 2 approaches 1 ) with ( column list ) ( select clause that joins fact table with D table ) and continue the joins here. This LOOKED good but didnt work that well though it brought down CPU > 50% , still took 15 mins to run from original Cagnos user Q that took over 3 hrs 2) Create volatile table with Fact joined to D and SAME PI and Partition as Fact. This ran the Q in < 3 mins ( including VT creation + VT stats + main Q run). VT rocks !!! – user1874594 Oct 01 '15 at 23:02

0 Answers0