teradata SQL : redistribution by join column causes 98% skew

Question

Wea re on Teradata 14. 2 Tables being LOJ . Each table PI has ONLY 1.5% SKEW . When ANY of these tables is being redestributed on the others Join key the spool table has a 98% skew and the query will just HANG UP on the merge join step

sel A.x , count ('1') from A left outer join B on A.x=B.y where B.y is NULL

This was production code ( already written in past ) that failed. I can rewrite it using an exclusion join and it works just fine. But Changing it is not an option. It redistribites B.y by A.x. So if I create a table in spool , with the B.* and A.x ( A.x is PI ) there's a 98% skew. There is a 98% skew for vis versa too. A.*, B.y ( B.y is redistrubuting column ) Stats are upto date. WITHOUT changing the orignal query - only other thing that I could do is run a diagnostic statement before the query and then it will avoid the redistribution step( run in < 5 s ). The other thing I could consider is drop stats on join column ( ? ) with the hope that the optimizer will avoid redistributing one table by the other ? My question is how can I get the optimizer to choose the alternate plan that it ran when I ran the diagnostic statement. from PDCR the tables being joined had same size and the query ran in the past ( though had a good amt of spool ) . Now it hangs up.This is a small 50 AMP system

Update to the comment : Here is the explain WITHOUT diagnostic. It will hang on the merge join step

Explain 
--diagnostic verboseexplain on for session ; 

SELECT      F.keyX   ,
          COUNT('1')
FROM    DB.TB1 F      LEFT OUTER JOIN DB.TB2 D 
    ON F.keyX = D.keyY 
where      (F.keyX IS NOT NULL 
    AND D.keyY IS NULL ) 
GROUP BY   F.keyX;

 This query is optimized using type 2 profile insert-sel, profileid
 10001.
  1) First, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.D.
  2) Next, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.F.
  3) We lock DB.D for read, and we lock DB.F for read.
  4) We do an all-AMPs RETRIEVE step from DB.F by way of an
     all-rows scan with no residual conditions into Spool 4 (all_amps)
     (compressed columns allowed), which is redistributed by hash code
     to all AMPs with hash fields ("DB.F.keyX")
     and Field1 ("DB.F.ROWID").  Then we do a SORT to order
     Spool 4 by row hash.  The size of Spool 4 is estimated with high
     confidence to be 270,555 rows (5,952,210 bytes).  Spool AsgnList:
     "Field_1" = "DB.F.ROWID",
     "keyX" = "DB.F.keyX".
  5) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from Spool 4 by way of an
          all-rows scan into Spool 5 (all_amps) (compressed columns
          allowed), which is duplicated on all AMPs with hash fields (
          "Spool_4.keyX") and Field1 ("Spool_4.Field_1").
          The size of Spool 5 is estimated with high confidence to be
          24,349,950 rows (535,698,900 bytes).  Spool AsgnList:
          "Field_1" = "Spool_4.Field_1",
          "keyX" = "Spool_4.keyX".
          The estimated time for this step is 0.42 seconds.
       2) We do an all-AMPs RETRIEVE step from DB.D by way of
          an all-rows scan with no residual conditions into Spool 6
          (all_amps) (compressed columns allowed), which is built
          locally on the AMPs with hash fields (
          "DB.D.keyY").  The size of Spool 6 is
          estimated with high confidence to be 307,616 rows (5,229,472
          bytes).  Spool AsgnList:
          "keyY" = "DB.D.keyY".
          The estimated time for this step is 0.02 seconds.
  6) We do an all-AMPs JOIN step (Global sum) from Spool 6 (Last Use)
     by way of an all-rows scan, which is joined to Spool 5 (Last Use)
     by way of an all-rows scan.  Spool 6 is used as the hash table and
     Spool 5 is used as the probe table in a joined using a single
     partition classical hash join, with a join condition of (
     "Spool_5.keyX = Spool_6.keyY").  The result
     goes into Spool 7 (all_amps) (compressed columns allowed), which
     is redistributed by hash code to all AMPs with hash fields (
     "Spool_5.keyX") and Field1 ("Spool_5.Field_1").  Then
     we do a SORT to order Spool 7 by row hash.  The size of Spool 7 is
     estimated with low confidence to be 270,555 rows (7,034,430 bytes).
     Spool AsgnList:
     "Field_1" = "Spool_5.Field_1",
     "keyY" = "{LeftTable}.keyY",
     "Field_3" = "Spool_5.keyX".
  7) We do an all-AMPs JOIN step (Global sum) from Spool 7 (Last Use)
     by way of a RowHash match scan, which is joined to Spool 4 (Last
     Use) by way of a RowHash match scan.  Spool 7 and Spool 4 are
     right outer joined using a merge join, with a join condition of (
     "Spool_7.Field_1 = Spool_4.Field_1").  The result goes into Spool
     8 (all_amps) (compressed columns allowed), which is built locally
     on the AMPs.  The size of Spool 8 is estimated with low confidence
     to be 270,555 rows (5,681,655 bytes).  Spool AsgnList:
     "keyY" = "{LeftTable}.keyY",
     "keyX" = "{RightTable}.keyX".
     The estimated time for this step is 0.49 seconds.
  8) We do an all-AMPs RETRIEVE step from Spool 8 (Last Use) by way of
     an all-rows scan with a condition of ("Spool_8.keyY IS
     NULL") into Spool 3 (all_amps) (compressed columns allowed), which
     is built locally on the AMPs with Field1 ("25614").  The size of
     Spool 3 is estimated with low confidence to be 270,555 rows (
     5,140,545 bytes).  Spool AsgnList:
     "Field_1" = "25614",
     "Spool_3.keyX" = "{ Copy }keyX".
     The estimated time for this step is 0.01 seconds.
  9) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
     way of an all-rows scan, and the grouping identifier in field 1.
     Aggregate Intermediate Results are computed globally, then placed
     in Spool 11.  The size of Spool 11 is estimated with low
     confidence to be 296 rows (6,216 bytes).  The estimated time for
     this step is 0.04 seconds.
 10) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
     an all-rows scan into Spool 1 (group_amps), which is built locally
     on the AMPs with Field1 ("UniqueId").  The size of Spool 1 is
     estimated with low confidence to be 296 rows (8,584 bytes).  Spool
     AsgnList:
     "Field_1" = "UniqueId",
     "Field_2" = "Field_2 ,Field_3 (INTEGER),".
     The estimated time for this step is 0.01 seconds.
 11) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.

This is the new explain AFTER diagnostic is run This will run in 5secs

EXPLAIN SELECT  F.keyY,
          COUNT('1') 
FROM    DB.TB1 F      LEFT OUTER JOIN DB.TB2 D 
    ON F.keyY = D.keyY 
where      (


     D.keyY IS NULL ) 
GROUP BY   F.keyY;

 This query is optimized using type 2 profile insert-sel, profileid
 10001.
  1) First, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.D.
  2) Next, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.F.
  3) We lock DB.D for read, and we lock DB.F for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from DB.D by way of
          an all-rows scan with no residual conditions split into Spool
          4 (all_amps) with a condition of ("keyY IN (:*)")
          to qualify rows matching skewed rows of the skewed relation
          (compressed columns allowed) and Spool 5 (all_amps) with
          remaining rows (compressed columns allowed).  Spool 4 is
          duplicated on all AMPs with hash fields (
          "DB.D.keyY").  Then we do a SORT to order
          Spool 4 by row hash.  The size of Spool 4 is estimated with
          high confidence to be 180 rows.  Spool 5 is built locally on
          the AMPs with hash fields ("DB.D.keyY").
          The size of Spool 5 is estimated with high confidence to be
          307,614 rows.  Spool 4 AsgnList:
          "keyY" = "DB.D.keyY".
          Spool 5 AsgnList:
          "keyY" = "DB.D.keyY".
          The estimated time for this step is 0.02 seconds.
       2) We do an all-AMPs RETRIEVE step from DB.F by way of
          an all-rows scan with no residual conditions split into Spool
          6 (all_amps) with a condition of ("keyY IN (:*)")
          to qualify skewed rows (compressed columns allowed) and Spool
          7 (all_amps) with remaining rows (compressed columns allowed).
          Spool 6 is built locally on the AMPs with hash fields (
          "DB.F.keyY").  Then we do a SORT to
          order Spool 6 by row hash.  The size of Spool 6 is estimated
          with high confidence to be 231,040 rows.  Spool 7 is
          redistributed by hash code to all AMPs with hash fields (
          "DB.F.keyY").  Then we do a SORT to
          order Spool 7 by row hash.  The size of Spool 7 is estimated
          with high confidence to be 39,515 rows.  Spool 6 AsgnList:
          "keyY" = "DB.F.keyY".
          Spool 7 AsgnList:
          "keyY" = "DB.F.keyY".
          The estimated time for this step is 0.32 seconds.
  5) We do an all-AMPs JOIN step (Global sum) from Spool 4 (Last Use)
     by way of a RowHash match scan, which is joined to Spool 6 (Last
     Use) by way of a RowHash match scan.  Spool 4 and Spool 6 are
     right outer joined using a merge join, with a join condition of (
     "Spool_6.keyY = Spool_4.keyY").  The result
     goes into Spool 8 (all_amps), which is built locally on the AMPs.
     The size of Spool 8 is estimated with low confidence to be 231,040
     rows (4,851,840 bytes).  Spool AsgnList:
     "keyY" = "{LeftTable}.keyY",
     "keyY" = "{RightTable}.keyY".
     The estimated time for this step is 0.01 seconds.
  6) We do an all-AMPs JOIN step (Local sum) from Spool 5 (Last Use) by
     way of a RowHash match scan, which is joined to Spool 7 (Last Use)
     by way of a RowHash match scan.  Spool 5 and Spool 7 are
     right outer joined using a merge join, with a join condition of (
     "Spool_7.keyY = Spool_5.keyY").  The result
     goes into Spool 8 (all_amps), which is built locally on the AMPs.
     The size of Spool 8 is estimated with low confidence to be 39,515
     rows (829,815 bytes).  Spool AsgnList:
     "keyY" = "{LeftTable}.keyY",
     "keyY" = "{RightTable}.keyY".
     The estimated time for this step is 0.02 seconds.
  7) We do an all-AMPs RETRIEVE step from Spool 8 (Last Use) by way of
     an all-rows scan with a condition of ("Spool_8.keyY IS
     NULL") into Spool 3 (all_amps) (compressed columns allowed), which
     is built locally on the AMPs with Field1 ("26086").  The size of
     Spool 3 is estimated with low confidence to be 270,555 rows (
     5,140,545 bytes).  Spool AsgnList:
     "Field_1" = "26086",
     "Spool_3.keyY" = "{ Copy }keyY".
     The estimated time for this step is 0.01 seconds.
  8) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
     way of an all-rows scan, and the grouping identifier in field 1.
     Aggregate Intermediate Results are computed globally, then placed
     in Spool 13.  The size of Spool 13 is estimated with low
     confidence to be 296 rows (6,216 bytes).  The estimated time for
     this step is 0.04 seconds.
  9) We do an all-AMPs RETRIEVE step from Spool 13 (Last Use) by way of
     an all-rows scan into Spool 1 (group_amps), which is built locally
     on the AMPs with Field1 ("UniqueId").  The size of Spool 1 is
     estimated with low confidence to be 296 rows (8,584 bytes).  Spool
     AsgnList:
     "Field_1" = "UniqueId",
     "Field_2" = "keyY ,Field_3 (INTEGER),".
     The estimated time for this step is 0.01 seconds.
 10) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.41 seconds.

Can you show the result of a `HELP STATS`and the actual query (plus Explain and/or DBQL Steps)? In best case run `DIAGNOSTIC HELPSTATS ON FOR SESSON;` and `DIAGNOSTIC VERBOSEEXPLAIN ON FOR SESSION;` before Explain. — dnoeth, Oct 30 '15 at 10:36
Howdy . Thanks for the peek. Helpstats was already on and it same back happy : ) . I am turning on verbose explain and posting the o/p here. I dont have step level turned on — Jason Reeves, Oct 30 '15 at 10:43
Hi again. I am back with both explains updated in the main Q — Jason Reeves, Oct 30 '15 at 11:10
The Not Null on LOJ I know is stupid. Its the way the code was written in past.. Anyways the optimizer plan will not change since its smart enough to realize. — Jason Reeves, Oct 30 '15 at 11:12
What's the Diagnostic you're talking about? The 2nd plan is a based on the PRPD (partial redistribution, partial duplication) feature, which was implemented to avoid those heavily skewed spools... — dnoeth, Oct 30 '15 at 12:03
no hash join . I was not aware of the latter terminology but figured from the altered explain that was it was trying to do but the problem is it wont do it by itself Even when there are full stats Why does it not realize the bad spool skew. How do you get it to work right without the diagnostic Td14 has extra stats options. Would those help change its plan. How about drop stats on the joins cols ? — Jason Reeves, Oct 30 '15 at 12:46
Without stats the optimizer never chooses PRPD. Based on the 2nd Explain the two values are assumed to be skewed (180 rows = 90 AMPs * 2). Why can't you rewrite the logic? What are the relational PKs/FKs? — dnoeth, Oct 30 '15 at 13:56
That'd be the 1st thing coming to mind . But this is a kind of check that is in production. Not sure who wrote that logic but the same logic performs several similar checks. Change one than change all is what it comes to and ppl are leery of production changes here so the closest I got them to run the odd code was with the diagnostic added. But coming to what you said "Without stats the optimizer never chooses PRPD. Based on the 2nd Explain the two values are assumed to be skewed (180 rows = 90 AMPs * 2) " . So then why cant it choose PRPD without diagnostic statement — Jason Reeves, Oct 30 '15 at 14:26
Is there any stats option out there that will tell the optimizer that it would end up getting a highly spooled skew if it redistributed . So pl "PRPD dont RD all the wa"y . — Jason Reeves, Oct 30 '15 at 14:40
This might be a bug, try open an incident with Teradata support. — dnoeth, Oct 30 '15 at 14:42
Ok thanks. So then diagnostic is the only soln without doing a code change.Thanks — Jason Reeves, Oct 30 '15 at 16:55
Can this be reproduced on another system in your environment? To @dnoeth's comment this could be an optimizer issue. It would certainly be worth the effort to collect a TSET for this query/view and submit it to Teradata support to be investigated. I wouldn't settle for having to rely on a diagnostic hint as the only acceptable solution here. — Rob Paller, Oct 31 '15 at 03:31
Hi Rob to the 2 pts you asked "Can this be reproduced on another system in your environment? " Yep it checks out fine on a 600 AMP Prod system. This is happening in a dev s if ystem so warrents less of a code change but if the code changed (with a better query ) it'll have to be all the way to prod. I am very well aware of Bug reporting TSET etc but most folks here env came from non teradata env. ..and least said are not so teradata savvy...so they have not yet heeded to my request to open an In# with T@YS — Jason Reeves, Nov 02 '15 at 23:44

teradata SQL : redistribution by join column causes 98% skew

0 Answers0