I have a table that records user transactions like this (simplified version) in BigQuery
user | transaction_date | label | cost |
---|---|---|---|
a | 2021-10-31 10:30:00 | y1 | 10 |
b | 2021-10-31 10:30:00 | y2 | 10 |
c | 2021-10-31 10:30:00 | y1 | 10 |
a | 2021-11-31 10:30:00 | y1 | 10 |
a | 2021-12-31 10:30:00 | y2 | 10 |
b | 2021-11-31 10:30:00 | y3 | 10 |
c | 2021-11-31 10:30:00 | y1 | 10 |
b | 2021-12-31 10:30:00 | y2 | 10 |
c | 2021-12-31 10:30:00 | y1 | 10 |
I am interested in information related to cost and current/next label. How can I use LEAD() to return the next different label as label_next?
i.e
user | transaction_date | label | cost | label_next |
---|---|---|---|---|
a | 2021-10-31 10:30:00 | y1 | 10 | y2 |
b | 2021-10-31 10:30:00 | y2 | 10 | y3 |
c | 2021-10-31 10:30:00 | y1 | 10 | y3 |
a | 2021-11-31 10:30:00 | y1 | 10 | y2 |
a | 2021-12-31 10:30:00 | y2 | 10 | y5 |
b | 2021-11-31 10:30:00 | y3 | 10 | y2 |
c | 2021-11-31 10:30:00 | y1 | 10 | y3 |
b | 2021-12-31 10:30:00 | y2 | 10 | null |
c | 2021-12-31 10:30:00 | y3 | 10 | null |
a | 2021-12-31 18:30:00 | y5 | 10 | null |
standard LEAD() would return just the next label i.e. for row 1 (user a) would return y1 as the same user is registered with the same label once before being seen with a different label.
I think I have one solution that involves: grouping by user, label and calculate min and max transaction_date then use LEAD() to get label_next per grouping and join that table on the initial table on user and transaction_date inside min and max transaction_date
But is there a way to do it differently?