0

I am able to get the correct results using UNION ALL - but have been trying to get this to work using GROUPING SETS without success. Is this possible?

Example SQL:

--Build data table
WITH TABLE_1 AS ( --data
SELECT 1 T1_ID, 2 VAL UNION
SELECT 2 T1_ID, 4 VAL UNION
SELECT 3 T1_ID, 6 VAL UNION
SELECT 4 T1_ID, 8 VAL UNION
SELECT 5 T1_ID, 10 VAL
),
TABLE_2 AS ( --first level join
SELECT 1 T1_ID, 'AA' T2_ID UNION
SELECT 2 T1_ID, 'AA' T2_ID UNION
SELECT 3 T1_ID, 'BB' T2_ID UNION
SELECT 4 T1_ID, 'BB' T2_ID UNION
SELECT 5 T1_ID, 'BB' T2_ID
),
TABLE_3 AS ( --second level join
SELECT 1 T1_ID, 'CCC' T3_ID UNION
SELECT 2 T1_ID, 'CCC' T3_ID UNION
SELECT 3 T1_ID, 'CCC' T3_ID UNION
SELECT 4 T1_ID, 'CCC' T3_ID UNION
SELECT 5 T1_ID, 'CCC' T3_ID UNION
SELECT 1 T1_ID, 'DDD' T3_ID UNION
SELECT 2 T1_ID, 'DDD' T3_ID UNION
SELECT 3 T1_ID, 'DDD' T3_ID UNION
SELECT 4 T1_ID, 'DDD' T3_ID UNION
SELECT 5 T1_ID, 'DDD' T3_ID 
)

This is what the data table looks like fully joined -


SELECT 
TABLE_1.T1_ID, 
TABLE_1.VAL, 
TABLE_2.T2_ID,
TABLE_3.T3_ID
FROM TABLE_1
    LEFT JOIN TABLE_2 ON TABLE_2.T1_ID = TABLE_1.T1_ID
    LEFT JOIN TABLE_3 ON TABLE_3.T1_ID = TABLE_1.T1_ID

T1_ID   VAL  T2_ID  T3_ID
1       2    AA     CCC
1       2    AA     DDD
2       4    AA     CCC
2       4    AA     DDD
3       6    BB     CCC
3       6    BB     DDD
4       8    BB     CCC
4       8    BB     DDD
5       10   BB     CCC
5       10   BB     DDD

--This gives me the results I need (Sum VAL by distinct AA, BB, AA+BB, nulls) -

SELECT 
    T2_ID,
    NULL T3_ID,
    SUM(VAL) TOTAL
FROM TABLE_1
    LEFT JOIN TABLE_2 ON TABLE_2.T1_ID = TABLE_1.T1_ID
GROUP BY T2_ID
UNION ALL
SELECT 
    NULL T2_ID, 
    T3_ID,
    SUM(VAL) TOTAL
FROM TABLE_1
    LEFT JOIN TABLE_3 ON TABLE_3.T1_ID = TABLE_1.T1_ID
GROUP BY T3_ID
UNION ALL
SELECT 
    T2_ID,
    T3_ID,
    SUM(VAL) TOTAL
FROM TABLE_1
    LEFT JOIN TABLE_2 ON TABLE_2.T1_ID = TABLE_1.T1_ID
    LEFT JOIN TABLE_3 ON TABLE_3.T1_ID = TABLE_1.T1_ID
GROUP BY T2_ID, T3_ID
UNION ALL
    SELECT 
    NULL T2_ID,
    NULL T3_ID,
    SUM(VAL) TOTAL
FROM TABLE_1

CORRECT RESULTS:

T2_ID   T3_ID   TOTAL
AA      NULL    6
BB      NULL    24
NULL    CCC     30
NULL    DDD     30
AA      CCC     6
BB      CCC     24
AA      DDD     6
BB      DDD     24
NULL    NULL    30

When I try using GROUPING SETS -

SELECT 
    GROUPING_ID (T2_ID, T3_ID) GRP_ID,
    T2_ID,
    T3_ID,
    SUM(VAL) TOTAL
FROM TABLE_1
    LEFT JOIN TABLE_2 ON TABLE_2.T1_ID = TABLE_1.T1_ID
    LEFT JOIN TABLE_3 ON TABLE_3.T1_ID = TABLE_1.T1_ID
GROUP BY 
    GROUPING SETS ((),
            T2_ID,
            T3_ID,
            (T2_ID,T3_ID))

My results are doubling where GRP_ID = 1 or 3 shown below

-----------------------------------------
GRP_ID  T2_ID   T3_ID   TOTAL
0       AA      CCC     6
0       BB      CCC     24
2       NULL    CCC     30
0       AA      DDD     6
0       BB      DDD     24
2       NULL    DDD     30
1       AA      NULL    12
1       BB      NULL    48
3       NULL    NULL    60

UNION ALL approach works - but I keep thinking there should be a way to use GROUPING SETS.

EDIT: I need to also add the reason why I'm looking for a grouping set solution - the real-world version of this problem is not limited to two joins used to drive the group conditions (this can come from many joins), and any of them can end up duplicating rows like the TABLE_3 in the example. With this, the real-world grouping would look more like this, for example:

   grouping sets ((),
                  (a, b)
                  (a, c)
                  (a, d)
                  (a, b, c)
                  (a, b, c, d)
                  a, 
                  b, 
                  c, 
                  d)
                  
zapf2023
  • 19
  • 4
  • Thanks for providing test data in an easily consumable form. The basic problem (as you probably realise but didn't point out specifically) is that `TABLE_3` can have many rows for a given `T1_ID` which is why the "multiplying out" for T2 aggregate occurs – Martin Smith Feb 17 '23 at 22:59
  • Yes, I did realize that - but thank you for pointing that out! – zapf2023 Feb 17 '23 at 23:00
  • What you are trying to accomplish is very close to `GROUP BY ... WITH CUBE`, but grouping does not behave in a manner you are looking for. The grouping combinations do not eliminate the grouped tables, they combine the results. Since each TABLE_1 row has two matching TABLE_3 rows, the value is being counted twice, even when Table_3 is grouped. There might be a way to build a guery `FROM TABLE_1 JOIN ((TABLE_2 UNION filler) CROSS JPOIN (TABLE_2 UNION filler))` (pseudocode - not real syntax). – T N Feb 17 '23 at 23:07
  • I think the best that you can get out of grouping sets is to eliminate one branch of your `UNION ALL` - with `GROUP BY GROUPING SETS ((), T2_ID)` instead of `GROUP BY T2_ID` and only if `TABLE_2` is constrained to not have any duplicates for `T1_ID` (as the example data might imply) – Martin Smith Feb 17 '23 at 23:27
  • Thank you for the feedback - sticking with the union all is fine, performance wise, it's not an issue. Was hoping a version using grouping sets would be possible, as it will be a much simpler query. – zapf2023 Feb 17 '23 at 23:49

2 Answers2

1

It's not pretty, but some might call it elegant.

I don't think I would recommend it, because one day, someone will have to come in for maintenance and figure it out. Then they will hunt you down.

SELECT T2_ID, T3_ID, SUM(VAL) TOTAL
FROM TABLE_1 T1
JOIN (
    SELECT T1_ID, T2_ID FROM TABLE_2
    UNION ALL
    SELECT T1_ID, NULL FROM TABLE_1 -- Null fillers
) T2 ON T2.T1_ID = T1.T1_ID
JOIN (
    SELECT T1_ID, T3_ID FROM TABLE_3
    UNION ALL
    SELECT T1_ID, NULL FROM TABLE_1 -- Null fillers
) T3 ON T3.T1_ID = T1.T1_ID
GROUP BY T2_ID, T3_ID
ORDER BY
    CASE WHEN T2_ID IS NULL THEN 1 ELSE 2 END,
    CASE WHEN T3_ID IS NULL THEN 1 ELSE 2 END,
    T2_ID,
    T3_ID

Sample results:

T2_ID T3_ID TOTAL
null null 30
null CCC 30
null DDD 30
AA null 6
BB null 24
AA CCC 6
AA DDD 6
BB CCC 24
BB DDD 24

See this db<>fiddle.

T N
  • 4,322
  • 1
  • 5
  • 18
1

CORRECT ANSWER - this works for N grouping levels and combinations -

The "SUB" CTE is needed to assign TABLE_1 row value for each individual grouping level, so that the summarizations are correct for each group level, which is done in the main/final query.

But there is a performance hit after all; with my actual data set, the UNION ALL was completing in 17 seconds - the GROUPING SETS option in 33 seconds.

SUB AS (
    SELECT 
        GROUPING_ID (TABLE_2.T2_ID, TABLE_3.T3_ID) GROUP_ID,
        TABLE_1.T1_ID,
        TABLE_2.T2_ID, 
        TABLE_3.T3_ID,
        MAX(TABLE_1.VAL) VAL
    FROM TABLE_1
        LEFT JOIN TABLE_2 ON TABLE_2.T1_ID = TABLE_1.T1_ID
        LEFT JOIN TABLE_3 ON TABLE_3.T1_ID = TABLE_1.T1_ID
    GROUP BY
        GROUPING SETS((),
                TABLE_2.T2_ID,
                TABLE_3.T3_ID,  
                (TABLE_2.T2_ID, TABLE_3.T3_ID)
                ),
        TABLE_1.T1_ID
)

--FINAL
SELECT 
    GROUP_ID,
    T2_ID, 
    T3_ID,
    SUM(VAL) TOTAL_VAL
FROM SUB
GROUP BY
    GROUP_ID,
    T2_ID, 
    T3_ID


(OLD ANSWER) - gets the same result as example table, BUT when other joins multiply the rows, it is not correct.

SELECT 
    GRP_ID,
    T2_ID,
    T3_ID,
    SUM(TOTAL)
FROM (
    SELECT DISTINCT
        GROUPING_ID (T2_ID, T3_ID) GRP_ID,
        SEMI_FINAL.T2_ID,
        SEMI_FINAL.T3_ID,
        SUM(VAL) TOTAL
    FROM (
        SELECT TABLE_1.*, T2_ID, T3_ID,
            RANK() OVER (ORDER BY TABLE_2.T2_ID) RANK2,
            RANK() OVER (ORDER BY TABLE_3.T3_ID) RANK3
        FROM TABLE_1
            LEFT JOIN TABLE_2 ON TABLE_2.T1_ID = TABLE_1.T1_ID
            LEFT JOIN TABLE_3 ON TABLE_3.T1_ID = TABLE_1.T1_ID
    ) SEMI_FINAL
    GROUP BY 
        GROUPING SETS ((),
                    SEMI_FINAL.T3_ID,
                    SEMI_FINAL.T2_ID,
                    (SEMI_FINAL.T2_ID, SEMI_FINAL.T3_ID))
                    , RANK2
                    , RANK3
 ) FINAL_TABLE          
GROUP BY 
    GRP_ID,
    T3_ID,
    T2_ID

And this returns the correct results -

GRP_ID  T2_ID   T3_ID   TOTAL
0       AA      CCC     6
0       BB      CCC     24
0       AA      DDD     6
0       BB      DDD     24
1       AA      NULL    6
1       BB      NULL    24
2       NULL    CCC     30
2       NULL    DDD     30
3       NULL    NULL    30
zapf2023
  • 19
  • 4