I am working with following sample data;
dt | ship_id | audit_id | action
2022-01-02 | 1351 | id1 | destroy
2022-01-01 | 1351 | id1 | create
2021-12-12 | 3457 | id2 | create
2021-12-16 | 3457 | id2 | destroy
2021-12-28 | 3457 | id3 | create
To give some context, for a given ship_id
, and audit_id
; an entry has to be created before it is destroyed as defined by action
column. For example, for ship_id=3457, and audit_id=id2; got created on Dec 12 and destroyed on Dec 16.
The goal is to get, for every dt
(when action is created), how many audit_ids are created before it , and how many audit_ids are destroyed before it.
Sample output:
dt | created_cnt | destroyed_cnt
2022-01-01 | 2 | 1
Possible Approach Using self join idea.
select
audit_id,
ship_id,
max(case when action = 'create' then dt end) as creation_time,
max(case when action = 'destroy' then dt end) as removal_time
from table
group by 1,2)
select
t1.creation_time as creation_date,
count(t2.audit_id) as created_cnt,
count(distinct case when t2.removal_time < t1.creation_time then t2.audit_id end) as
destroyed_cnt
from cte as t1
left join cte as t2 on t1.creation_time > t2.creation_time
group by 1
order by 1 desc;
But due to large table, this self-join is slowing things down. Is it possible to use some sort of window functions here to replace joining? Help is appreciated.