0

I have a master table and a reference table as below.

WITH MAS as (
SELECT 10 as CUSTOMER_ID, 1 PROCESS_ID, 44 PROCESS_TYPE, 200 as AMOUNT FROM DUAL UNION ALL
SELECT 10 as CUSTOMER_ID, 1 PROCESS_ID, 44 PROCESS_TYPE, 250 as AMOUNT FROM DUAL UNION ALL
SELECT 10 as CUSTOMER_ID, 2 PROCESS_ID, 45 PROCESS_TYPE, 300 as AMOUNT FROM DUAL UNION ALL
SELECT 10 as CUSTOMER_ID, 2 PROCESS_ID, 45 PROCESS_TYPE, 350 as AMOUNT FROM DUAL 
), REFTAB as (
SELECT 44 PROCESS_TYPE, 'A' GROUP_ID FROM DUAL UNION ALL 
SELECT 44 PROCESS_TYPE, 'B' GROUP_ID FROM DUAL UNION ALL
SELECT 45 PROCESS_TYPE, 'C' GROUP_ID FROM DUAL UNION ALL 
SELECT 45 PROCESS_TYPE, 'D' GROUP_ID FROM DUAL
) SELECT ...

My first select statement which works correctly is this one:

SELECT CUSTOMER_ID,
       SUM(AMOUNT) as AMOUNT1,
       SUM(CASE WHEN PROCESS_TYPE IN (SELECT PROCESS_TYPE FROM REFTAB WHERE GROUP_ID = 'A') 
                THEN AMOUNT ELSE NULL END) as AMOUNT2,
       COUNT(CASE WHEN PROCESS_TYPE IN (SELECT PROCESS_TYPE FROM REFTAB WHERE GROUP_ID = 'D') 
                  THEN 1 ELSE NULL END) as COUNT1
   FROM MAS
  GROUP BY CUSTOMER_ID

However, to address a performance issue, I changed it to this select statement:

SELECT CUSTOMER_ID,
       SUM(AMOUNT) as AMOUNT1,
       SUM(CASE WHEN GROUP_ID = 'A' THEN AMOUNT ELSE NULL END) as AMOUNT2,
       COUNT(CASE WHEN GROUP_ID = 'D' THEN 1 ELSE NULL END) as COUNT1
   FROM MAS A
   LEFT JOIN REFTAB B ON A.PROCESS_TYPE = B.PROCESS_TYPE
  GROUP BY CUSTOMER_ID

For the AMOUNT2 and COUNT1 columns, the values stay the same. But for AMOUNT1, the value is multiplied because of the join with the reference table.

I know I can add 1 more left join with an additional join condition on GROUP_ID. But that won't be any different from using a subquery.

Any idea how to make the query work with just 1 left join while not multiplying the AMOUNT1 value?

sstan
  • 35,425
  • 6
  • 48
  • 66
Deniz
  • 191
  • 1
  • 6
  • 17

3 Answers3

0

The normal way is to aggregate the values before the group by. You can also use conditional aggregation, if the rest of the query is correct:

SELECT CUSTOMER_ID,
       SUM(CASE WHEN seqnum = 1 THEN AMOUNT END) as AMOUNT1,
       SUM(CASE WHEN GROUP_ID = 'A' THEN AMOUNT ELSE NULL END) as AMOUNT2,
       COUNT(CASE WHEN GROUP_ID = 'D' THEN 1 ELSE NULL END) as COUNT1
FROM MAS A LEFT JOIN
     (SELECT B.*, ROW_NUMBER() OVER (PARTITION BY PROCESS_TYPE ORDER BY PROCESS_TYPE) as seqnum
      FROM REFTAB B
     ) B
     ON A.PROCESS_TYPE = B.PROCESS_TYPE
GROUP BY CUSTOMER_ID;

This ignores the duplicates created by the joins.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

I know I can add 1 more left join with adding aditional GROUP_ID clause but it wont be different from subquery.

You'd be surprised. Having 2 left joins instead of subqueries in the SELECT gives the optimizer more ways of optimizing the query. I would still try it:

select m.customer_id,
       sum(m.amount) as amount1,
       sum(case when grpA.group_id is not null then m.amount end) as amount2,
       count(grpD.group_id) as count1
  from mas m
  left join reftab grpA
    on grpA.process_type = m.process_type
   and grpA.group_id = 'A'
  left join reftab grpD
    on grpD.process_type = m.process_type
   and grpD.group_id = 'D'
 group by m.customer_id

You can also try this query, which uses the SUM() analytic function to calculate the amount1 value before the join to avoid the duplicate value problem:

select m.customer_id,
       m.customer_sum as amount1,
       sum(case when r.group_id = 'A' then m.amount end) as amount2,
       count(case when r.group_id = 'D' then 'X' end) as count1
  from (select customer_id,
               process_type,
               amount,
               sum(amount) over (partition by customer_id) as customer_sum
          from mas) m
  left join reftab r
    on r.process_type = m.process_type
 group by m.customer_id,
          m.customer_sum

You can test both options, and see which one performs better.

sstan
  • 35,425
  • 6
  • 48
  • 66
0

Starting off with your original query, simply replacing your IN queries with EXISTS statements should provide a significant boost. Also, be wary of summing NULLs, perhaps your ELSE statements should be 0?

SELECT CUSTOMER_ID,
       SUM(AMOUNT) as AMOUNT1,
       SUM(CASE WHEN EXISTS(SELECT 1 FROM REFTAB WHERE REFTAB.GROUP_ID = 'A' AND REFTAB.PROCESS_TYPE = MAS.PROCESS_TYPE)
                THEN AMOUNT ELSE NULL END) as AMOUNT2,
       COUNT(CASE WHEN EXISTS(SELECT 1 FROM REFTAB WHERE REFTAB.GROUP_ID = 'D' AND REFTAB.PROCESS_TYPE = MAS.PROCESS_TYPE) 
                  THEN 1 ELSE NULL END) as COUNT1
   FROM MAS
  GROUP BY CUSTOMER_ID
Adam Martin
  • 1,188
  • 1
  • 11
  • 24