1

In a Table, I need to combine 2 records into one record, based on columns (in this case column 1.WORK_ORDER_NUM, 2.ESN 3.PLANT 4. REMD_PART_NUM 5. REMD_PART_SERIAL). If these 5 columns are equal then I have to use Aggregate function in other column to make it into one record.[In this case the columns are LLP_TRACKD_PART_IND,REMD_PART_TSN and REMD_PART_CSN].

This is what I've tried:

SELECT decode (PLANT ,'ECL','CELMA','EDS','CELMA',PLANT)PLANT,
 COUNT(*) RECORD_COUNT,
 COUNT(DISTINCT OFF.REMD_PART_NUM) REMD_PART_NUM_COUNT,
 COUNT(DISTINCT OFF.REMD_PART_SERIAL) REMD_PART_SER_NUM_COUNT,
  COUNT(DECODE(LLP_TRACKD_PART_IND,'LL',LLP_TRACKD_PART_IND,NULL)) LL_COUNT,
 COUNT(DECODE(LLP_TRACKD_PART_IND,'LR',LLP_TRACKD_PART_IND,NULL)) LR_COUNT,
 COUNT(DECODE(LLP_TRACKD_PART_IND,'TR',LLP_TRACKD_PART_IND,NULL)) TR_COUNT,
 SUM(OFF.REMD_PART_QTY) TOTAL_REMD_PART_QTY, 
  SUM(decode(LLP_TRACKD_PART_IND,null,0,
     CASE
       WHEN REGEXP_LIKE(REMD_PART_TSN, '^-?\d+(\.\d+)?$')
            THEN CAST(REMD_PART_TSN AS NUMBER)
        ELSE 0
    END
  )) TOTAL_TSN,
 SUM(decode(LLP_TRACKD_PART_IND,null,0,
    CASE
        WHEN REGEXP_LIKE(REMD_PART_CSN, '^-?\d+(\.\d+)?$')
            THEN CAST(REMD_PART_CSN AS NUMBER)
        ELSE 0
    END
 )) TOTAL_CSN
FROM (with t as ( SELECT distinct PLANT,
WORK_ORDER_NUM,ESN,REMD_PART_NUM,REMD_PART_SERIAL,REMD_PART_IIN,
LLP_TRACKD_PART_IND,
REMD_PART_QTY,REMD_PART_TSN,REMD_PART_CSN,REMD_PART_TSO,REMD_PART_CSO
,REMD_PART_TSC,REMD_PART_CSC,REMD_CYCLE_REMAIN          
FROM <TABLE1> 
WHERE 
REMD_PART_NUM is not null 
)
select DISTINCT PLANT,WORK_ORDER_NUM,ESN,REMD_PART_NUM,REMD_PART_SERIAL 
,REMD_PART_IIN
 ,(select max(LLP_TRACKD_PART_IND) from t bb  where aa.PLANT=bb.PLANT and 
aa.WORK_ORDER_NUM=bb.WORK_ORDER_NUM 
 and aa.ESN=bb.ESN 
  and aa.REMD_PART_NUM=bb.REMD_PART_NUM
  and aa.REMD_PART_SERIAL=bb.REMD_PART_SERIAL) LLP_TRACKD_PART_IND
 ,REMD_PART_QTY
 ,(select max(REMD_PART_TSN) from t bb  where aa.PLANT=bb.PLANT and 
aa.WORK_ORDER_NUM=bb.WORK_ORDER_NUM 
 and aa.ESN=bb.ESN 
 and aa.REMD_PART_NUM=bb.REMD_PART_NUM
 and aa.REMD_PART_SERIAL=bb.REMD_PART_SERIAL) REMD_PART_TSN
 ,
(select max(REMD_PART_CSN) from t bb  where aa.PLANT=bb.PLANT and             
aa.WORK_ORDER_NUM=bb.WORK_ORDER_NUM 
  and aa.ESN=bb.ESN 
  and aa.REMD_PART_NUM=bb.REMD_PART_NUM
  and aa.REMD_PART_SERIAL=bb.REMD_PART_SERIAL) REMD_PART_CSN
  from t aa) OFF
WHERE 
REMD_PART_NUM is not null
GROUP BY decode (PLANT ,'ECL','CELMA','EDS','CELMA',PLANT) 

And it takes around 8 hours to complete. Is there any other way to complete it faster. It took 8 hours to complete

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Fine tuning? Are you using MySQL or Oracle? (Don't tag products not involved.) – jarlh Dec 07 '17 at 10:37
  • @jarlh It's Oracle. – Tim Biegeleisen Dec 07 '17 at 10:37
  • It's Oracle. @jarlh – Selvathalapathy S Dec 07 '17 at 10:42
  • Look at the execution plan to determine where oracle is spending all its time working things out. ps; as far as queries go, it looks naaasty - queries that have a nested subquery between select and from always raise alarm bells for me. I think you might do better not by asking us to fix the query you wrote, but by giving us example inputs and output and asking for help to write a new query in a different way, to solve the problem – Caius Jard Dec 07 '17 at 11:36

1 Answers1

1

Start with a better formatting of the query, it will allow you to easily understand the code and notice repeated patterns:

SELECT decode (PLANT ,'ECL','CELMA','EDS','CELMA',PLANT)PLANT,
    COUNT(*) RECORD_COUNT,
    COUNT(DISTINCT OFF.REMD_PART_NUM) REMD_PART_NUM_COUNT,
    COUNT(DISTINCT OFF.REMD_PART_SERIAL) REMD_PART_SER_NUM_COUNT,
    COUNT(DECODE(LLP_TRACKD_PART_IND,'LL',LLP_TRACKD_PART_IND,NULL)) LL_COUNT,
    COUNT(DECODE(LLP_TRACKD_PART_IND,'LR',LLP_TRACKD_PART_IND,NULL)) LR_COUNT,
    COUNT(DECODE(LLP_TRACKD_PART_IND,'TR',LLP_TRACKD_PART_IND,NULL)) TR_COUNT,
    SUM(OFF.REMD_PART_QTY) TOTAL_REMD_PART_QTY, 
    SUM(decode(LLP_TRACKD_PART_IND,null,0,
            CASE
                WHEN REGEXP_LIKE(REMD_PART_TSN, '^-?\d+(\.\d+)?$')
                THEN CAST(REMD_PART_TSN AS NUMBER)
                ELSE 0
            END
        )) TOTAL_TSN,
    SUM(decode(LLP_TRACKD_PART_IND,null,0,
            CASE
                WHEN REGEXP_LIKE(REMD_PART_CSN, '^-?\d+(\.\d+)?$')
                THEN CAST(REMD_PART_CSN AS NUMBER)
                ELSE 0
            END
        )) TOTAL_CSN
FROM (
    with t as ( 
        SELECT distinct PLANT, WORK_ORDER_NUM,ESN,REMD_PART_NUM,REMD_PART_SERIAL,REMD_PART_IIN,
                LLP_TRACKD_PART_IND,
                REMD_PART_QTY,REMD_PART_TSN,REMD_PART_CSN,REMD_PART_TSO,REMD_PART_CSO
                ,REMD_PART_TSC,REMD_PART_CSC,REMD_CYCLE_REMAIN          
        FROM <TABLE1> 
        WHERE REMD_PART_NUM is not null 
    )
    select DISTINCT PLANT,WORK_ORDER_NUM,ESN,REMD_PART_NUM,REMD_PART_SERIAL 
            ,REMD_PART_IIN
            ,(  select max(LLP_TRACKD_PART_IND) 
                from t bb  
                where aa.PLANT=bb.PLANT 
                    and aa.WORK_ORDER_NUM=bb.WORK_ORDER_NUM 
                    and aa.ESN=bb.ESN 
                    and aa.REMD_PART_NUM=bb.REMD_PART_NUM
                    and aa.REMD_PART_SERIAL=bb.REMD_PART_SERIAL
            ) LLP_TRACKD_PART_IND
            ,REMD_PART_QTY
            ,(  select max(REMD_PART_TSN) from t bb  
                where aa.PLANT=bb.PLANT 
                    and aa.WORK_ORDER_NUM=bb.WORK_ORDER_NUM 
                    and aa.ESN=bb.ESN 
                    and aa.REMD_PART_NUM=bb.REMD_PART_NUM
                    and aa.REMD_PART_SERIAL=bb.REMD_PART_SERIAL
            ) REMD_PART_TSN
            ,
            (   select max(REMD_PART_CSN) from t bb  
                where aa.PLANT=bb.PLANT 
                    and aa.WORK_ORDER_NUM=bb.WORK_ORDER_NUM 
                    and aa.ESN=bb.ESN 
                    and aa.REMD_PART_NUM=bb.REMD_PART_NUM
                    and aa.REMD_PART_SERIAL=bb.REMD_PART_SERIAL
            ) REMD_PART_CSN
    from t aa
) OFF
WHERE REMD_PART_NUM is not null
GROUP BY decode (PLANT ,'ECL','CELMA','EDS','CELMA',PLANT)
;

You will see that the below pattern is repeated 3 times (3 almost identical subqueries):

select max( some_field ) 
from t bb  
where aa.PLANT=bb.PLANT 
  and aa.WORK_ORDER_NUM=bb.WORK_ORDER_NUM 
  and aa.ESN=bb.ESN 
  and aa.REMD_PART_NUM=bb.REMD_PART_NUM
  and aa.REMD_PART_SERIAL=bb.REMD_PART_SERIAL

You will also easily see that there are 3 very expensive sort operations in this query - one DISTINCT in the innermost subquery, then another DISTINCT in another subquery, then finally GROUP BY operation (a kind of DISTINCT) in the top level query.


Looking only on your query one can easily eliminate one sort (DISTINCT) using analytic functions in this way:

SELECT * FROM (
    SELECT PLANT,WORK_ORDER_NUM,ESN,REMD_PART_NUM,REMD_PART_SERIAL,
           REMD_PART_IIN, REMD_PART_QTY,
           max( LLP_TRACKD_PART_IND ) over 
              (partition by PLANT, WORK_ORDER_NUM, ESN, REMD_PART_NUM, REMD_PART_SERIAL) 
           as LLP_TRACKD_PART_IND,
           max( REMD_PART_TSN )       over 
              (partition by PLANT, WORK_ORDER_NUM, ESN, REMD_PART_NUM, REMD_PART_SERIAL) 
           as REMD_PART_TSN,
           max( REMD_PART_CSN )       over 
              (partition by PLANT, WORK_ORDER_NUM, ESN, REMD_PART_NUM, REMD_PART_SERIAL) 
           as REMD_PART_CSN,
           row_number() over 
              (partition by PLANT, WORK_ORDER_NUM, ESN, REMD_PART_NUM, REMD_PART_SERIAL, REMD_PART_IIN, REMD_PART_QTY 
                   order by PLANT) as Rn
    FROM TABLE1
    WHERE REMD_PART_NUM is not null 
)
WHERE rn = 1

so the final query migh be:

SELECT decode (PLANT ,'ECL','CELMA','EDS','CELMA',PLANT)PLANT,
    COUNT(*) RECORD_COUNT,
    COUNT(DISTINCT OFF.REMD_PART_NUM) REMD_PART_NUM_COUNT,
    COUNT(DISTINCT OFF.REMD_PART_SERIAL) REMD_PART_SER_NUM_COUNT,
    COUNT(DECODE(LLP_TRACKD_PART_IND,'LL',LLP_TRACKD_PART_IND,NULL)) LL_COUNT,
    COUNT(DECODE(LLP_TRACKD_PART_IND,'LR',LLP_TRACKD_PART_IND,NULL)) LR_COUNT,
    COUNT(DECODE(LLP_TRACKD_PART_IND,'TR',LLP_TRACKD_PART_IND,NULL)) TR_COUNT,
    SUM(OFF.REMD_PART_QTY) TOTAL_REMD_PART_QTY, 
    SUM(decode(LLP_TRACKD_PART_IND,null,0,
            CASE
                WHEN REGEXP_LIKE(REMD_PART_TSN, '^-?\d+(\.\d+)?$')
                THEN CAST(REMD_PART_TSN AS NUMBER)
                ELSE 0
            END
        )) TOTAL_TSN,
    SUM(decode(LLP_TRACKD_PART_IND,null,0,
            CASE
                WHEN REGEXP_LIKE(REMD_PART_CSN, '^-?\d+(\.\d+)?$')
                THEN CAST(REMD_PART_CSN AS NUMBER)
                ELSE 0
            END
        )) TOTAL_CSN
FROM (
    SELECT PLANT,WORK_ORDER_NUM,ESN,REMD_PART_NUM,REMD_PART_SERIAL,REMD_PART_IIN, REMD_PART_QTY,
           max( LLP_TRACKD_PART_IND ) over (partition by PLANT, WORK_ORDER_NUM, ESN, REMD_PART_NUM, REMD_PART_SERIAL) as LLP_TRACKD_PART_IND,
           max( REMD_PART_TSN )       over (partition by PLANT, WORK_ORDER_NUM, ESN, REMD_PART_NUM, REMD_PART_SERIAL) as REMD_PART_TSN,
           max( REMD_PART_CSN )       over (partition by PLANT, WORK_ORDER_NUM, ESN, REMD_PART_NUM, REMD_PART_SERIAL) as REMD_PART_CSN,
           row_number()               over (partition by PLANT, WORK_ORDER_NUM, ESN, REMD_PART_NUM, REMD_PART_SERIAL, REMD_PART_IIN, REMD_PART_QTY 
                                            order by PLANT) as Rn
    FROM TABLE1
    WHERE REMD_PART_NUM is not null 
)
WHERE rn = 1
GROUP BY decode (PLANT ,'ECL','CELMA','EDS','CELMA',PLANT)

I feel that this query could be further optimized, but it would require insight into the structure of the tables and a knowledge of business requirements.


There are still some microoptimalizations possible, though.
This pattern:

SUM(decode(LLP_TRACKD_PART_IND,null,0,
        CASE
            WHEN REGEXP_LIKE(REMD_PART_TSN, '^-?\d+(\.\d+)?$')
            THEN CAST(REMD_PART_TSN AS NUMBER)
            ELSE 0
        END
    )) TOTAL_TSN,

can be replaced with this one:

coalesce( 
    SUM( 
        CASE
            WHEN REGEXP_LIKE(REMD_PART_TSN, '^-?\d+(\.\d+)?$')
            THEN CAST(REMD_PART_TSN AS NUMBER)
            ELSE 0
        END
    ), 0
)

since SUM ignores nulls, then checking for NULL value for each record is a waste. For a small number of records ( < 1 million) it does not matter, but for hundreds of millions of records you can achieve the effect of scale - say 0.05 ms for checking each record multiplied by 10.000.000 records can give 500 seconds.

krokodilko
  • 35,300
  • 7
  • 55
  • 79