Can I get a specific column value based on an aggregated column?

Question

I am aggregating a table in AWS Glue (spark sql) to get the min of created_on and the sum of total. However I want to group by id, id2, id3 and to get the corresponding engment_partn_text value for the earliest created_on date. So that the first 3 rows would only show Phill but total still right.

I am running this query within aws glue.

I worked around this dividing this query and joining them back together but I would like to know if there is a more straightforward way.

Current outcome

id	id2	id3	created_on	engmt_prtnr_text	total
9045692e4557	6206435	XRS00664	9/12/2022	Rey	15
9045692e4557	6206435	XRS00664	5/10/2020	Phil	10
9045692e4557	6206435	XRS00664	9/12/2022	Jenny	5
4fefkb67z39f	6666269	zGR00364	4/20/2022	Dan	3
4fefkb67z39f	6666269	zGR00364	7/26/2026	HYghes PetYnXz	5
b765430374zk	6600464	NEB00023	2/2/2021	William	17
b765430374zk	6600464	NEB00023	2/2/2022	Jae	25
d376zze4be63	6262633	YSD07507	2/2/2022	Joseph	0
d376zze4be63	6262633	YSD07507	2/2/2021	Doe	20

Expected outcome

id	id2	id3	created_on	engmt_prtnr_text	total
9045692e4557	6206435	XRS00664	9/12/2022	Phil	30
4fefkb67z39f	6666269	zGR00364	4/20/2022	Dan	8
b765430374zk	6600464	NEB00023	2/2/2021	William	32
d376zze4be63	6262633	YSD07507	2/2/2022	Doe	20

The query I managed to do was

WITH 
table2 AS (
    SELECT
        id
        ,id2
        ,id3
        ,min(created_on) AS created_on
        ,wbs.engmt_prtnr_text
        ,ROW_NUMBER() OVER(PARTITION BY id, id2,id3 ORDER BY MIN(created_on) DESC) AS position
    FROM
         source
    GROUP BY
        id
        ,id2
        ,id3
        ,engmt_prtnr_text
        ),
table1 AS (
    SELECT
        id
        ,id2
        ,id3
        ,MIN(created_on) AS created_on
        ,SUM(total) AS total
    FROM
         source
    GROUP BY
        id
        ,id2
        ,id3
)      
    
SELECT t1.id
       ,t1.id2
       ,t1.id3
       ,t1.created_on
       ,t2.engmt_prtnr_text
       ,t1.total
FROM
    tabla1 AS t1
LEFT JOIN
    tabl2 AS t2
    ON
        t1.id= t2.id
        AND t1.id2 = t2.id2
        AND t1.id3= t2.id3
        AND t2.position = 1

Please update your question with the SQL you’ve managed to write. You should also always provide all information as editable text, rather than images, so that someone can copy it if they want to try a solution. It would also be helpful if you provided the source data that results in the output you are showing — NickW, Mar 27 '23 at 20:30

Can I get a specific column value based on an aggregated column?

0 Answers0