Assume my table structure is this
I am planning to group it by (USER and SEQUENCE ) and get the LEAD timestamp for the next sequence. Here is the output that I am looking for
Can I solve this without JOIN using LEAD function if possible ?
Assume my table structure is this
I am planning to group it by (USER and SEQUENCE ) and get the LEAD timestamp for the next sequence. Here is the output that I am looking for
Can I solve this without JOIN using LEAD function if possible ?
Below is for BigQuery Standard SQL
I will present two options - using JOIN (just to justify that I understood/reversed-engineered expected logic correctly) and then JOIN-less version (note I am using ts
as a field name instead of timestamp
)
Using JOIN
#standardSQL
SELECT a.user, a.sequence, MIN(b.ts) ts
FROM (
SELECT user, sequence, MAX(ts) AS max_ts
FROM `project.dataset.table`
GROUP BY user, sequence
) a
LEFT JOIN `project.dataset.table` b
ON a.user = b.user AND b.sequence = a.sequence + 1
WHERE a.max_ts <= IFNULL(b.ts, a.max_ts)
GROUP BY user, sequence
-- ORDER BY user, sequence
JOIN-less version
#standardSQL
SELECT
user, sequence,
(
SELECT ts FROM UNNEST(arr_ts) ts
WHERE max_ts < ts ORDER BY ts LIMIT 1
) ts
FROM (
SELECT
user, sequence, max_ts,
LEAD(arr_ts) OVER (PARTITION BY user ORDER BY sequence) arr_ts
FROM (
SELECT
user, sequence, MAX(ts) max_ts,
ARRAY_AGG(ts ORDER BY ts) arr_ts
FROM `project.dataset.table`
GROUP BY user, sequence
)
)
-- ORDER BY user, sequence
Both above versions can be tested / played with using below dummy data
WITH `project.dataset.table` AS (
SELECT 'user1' user, 2 sequence, 'T1' ts UNION ALL
SELECT 'user1', 2, 'T2' UNION ALL
SELECT 'user1', 1, 'T3' UNION ALL
SELECT 'user1', 1, 'T4' UNION ALL
SELECT 'user1', 3, 'T5' UNION ALL
SELECT 'user1', 2, 'T6' UNION ALL
SELECT 'user1', 3, 'T7' UNION ALL
SELECT 'user1', 3, 'T8'
)
and both returns below result
user sequence ts
user1 1 T6
user1 2 T7
user1 3 null
Not sure about bigquery, but in general SQL it would be written as:
select user, sequence, LEAD (max_timestamp,1) OVER (PARTITION BY user ORDER BY sequence) as timestamp
from (
select user, sequence, max(timestamp) as max_timestamp
from table
group by user, sequence) q1;
Just be aware of reserved words suchas table, user, timestamp etc.
Edit: Yes, forget about this answer, wasn't attentive enough about required output. Mikhail got it right!