How to get the latest change from a table per ID

Question

I have a table like

ID	TYPE	START	END	TIMESTAMP_
1	A	NULL	5	'2022-03-03'
1	B	NULL	7	'2023-07-19'
1	A	5	NULL	'2023-07-20'
1	B	NULL	7	'2023-07-21'
4	A	NULL	20	'2023-06-30'
4	A	20	17	'2023-07-01'

I am trying to get the latest change per ID per TYPE (the change is shown as START and END like for the first row, the ID went from NULL to 5 for TYPE A and NULL to 7 for TYPE B, THEN 5 back to NULL for TYPE A etc). I have done

SELECT *
FROM MY_TABLE
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE, ORDER BY TIMESTAMP_ DESC) = 1;

This results in a table like

ID	TYPE	START	END	TIMESTAMP_
1	A	5	NULL	'2023-07-20'
1	B	NULL	7	'2023-07-21'
4	A	20	17	'2023-07-01'

Now I am trying to get the changes to sort of coalesce vertically within id group. I am trying to end up with a table like

ID	TYPE	START	END	TIMESTAMP_
1	A_B	5	7	'2023-07-21'
4	A	20	17	'2023-07-01'

where if there are two types within the ID grouping then the value for TYPE can be renamed to include both TYPES that were used. Is there a way to achieve this? Thanks!

score 0 · Answer 1 · answered Jul 25 '23 at 17:50

Yes we can definetly do that. So to use coleasce verically you have to use LISTAGG which ignores nulls and concatenates the records ordered based on column you provide and then simply get the first record in LISTAGG output.

SELECT ID
    ,LISTAGG(TYPE,'_') within group(order by timestamp_)
    ,SPLIT(LISTAGG(start_,'_') within group(order by timestamp_),'_')[0]::INT start_
    ,SPLIT(LISTAGG(END_,'_') within group(order by timestamp_),'_')[0]::INT END_
FROM 
    (SELECT *
     FROM MY_TABLE
     QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE ORDER BY TIMESTAMP_ DESC) = 1
    ) r
GROUP BY ID
ORDER BY ID

score 0 · Accepted Answer · answered Jul 25 '23 at 21:42

Another method is to use MAX_BY and MIN_BY and avoid carrying to much state, if you have large counts of type per id:

with fake_table(ID, TYPE, START_, END_, TIMESTAMP_) as (
    select * from values 
        (1, 'A', NULL,  5,  '2022-03-03'),
        (1, 'A', 5, NULL,   '2023-07-20'),
        (1, 'B', NULL,  7,  '2023-07-19'),
        (1, 'B', NULL,  7,  '2023-07-21'),
        (4, 'A', NULL,  20, '2023-06-30'),
        (4, 'A', 20,    17, '2023-07-01')
), latest_rows as (
    SELECT *
    FROM fake_table
    QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE ORDER BY TIMESTAMP_ DESC) = 1
)
select 
    id
    ,LISTAGG(type, '_') within group (order by  TIMESTAMP_) as type
    ,MIN_BY(START_, TIMESTAMP_) as start_
    ,MAX_BY(END_, TIMESTAMP_) as end_
from latest_rows
group by id
order by id;

gives:

in order to apply this to a larger set of values, wouldn't max_by and min_by only solve it one way, what if it was reversed? Is there a more general way to coalesce the values? — Anonymous, Jul 26 '23 at 21:03
To be honest I am not sure what your question is. If you are asking “will this work in more complex example” the answer is it depends. But this is to be expected. Sql is a programming language, and many tricks/optimazations rely on side effect, so not all side effect scale to infinity. — Simeon Pilgrim, Jul 26 '23 at 23:27

How to get the latest change from a table per ID

2 Answers2