3

Assume my table structure is this

enter image description here

I am planning to group it by (USER and SEQUENCE ) and get the LEAD timestamp for the next sequence. Here is the output that I am looking for

enter image description here

Can I solve this without JOIN using LEAD function if possible ?

phaigeim
  • 729
  • 13
  • 34

2 Answers2

4

Below is for BigQuery Standard SQL

I will present two options - using JOIN (just to justify that I understood/reversed-engineered expected logic correctly) and then JOIN-less version (note I am using ts as a field name instead of timestamp)

Using JOIN

#standardSQL
SELECT a.user, a.sequence, MIN(b.ts) ts 
FROM (
  SELECT user, sequence, MAX(ts) AS max_ts
  FROM `project.dataset.table`
  GROUP BY user, sequence
) a
LEFT JOIN `project.dataset.table` b
ON a.user = b.user AND b.sequence = a.sequence + 1
WHERE a.max_ts <= IFNULL(b.ts, a.max_ts)
GROUP BY user, sequence
-- ORDER BY user, sequence

JOIN-less version

#standardSQL
SELECT
  user, sequence, 
  (
    SELECT ts FROM UNNEST(arr_ts) ts 
    WHERE max_ts < ts ORDER BY ts LIMIT 1
  ) ts
FROM (
  SELECT
    user, sequence, max_ts,
    LEAD(arr_ts) OVER (PARTITION BY user ORDER BY sequence) arr_ts
  FROM (
  SELECT 
      user, sequence, MAX(ts) max_ts, 
      ARRAY_AGG(ts ORDER BY ts) arr_ts
    FROM `project.dataset.table`
    GROUP BY user, sequence
  )
)
-- ORDER BY user, sequence   

Both above versions can be tested / played with using below dummy data

WITH `project.dataset.table` AS (
  SELECT 'user1' user, 2 sequence, 'T1' ts UNION ALL
  SELECT 'user1', 2, 'T2' UNION ALL
  SELECT 'user1', 1, 'T3' UNION ALL
  SELECT 'user1', 1, 'T4' UNION ALL
  SELECT 'user1', 3, 'T5' UNION ALL
  SELECT 'user1', 2, 'T6' UNION ALL
  SELECT 'user1', 3, 'T7' UNION ALL
  SELECT 'user1', 3, 'T8' 
)   

and both returns below result

user    sequence    ts   
user1   1           T6   
user1   2           T7   
user1   3           null     
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
1

Not sure about bigquery, but in general SQL it would be written as:

select user, sequence, LEAD (max_timestamp,1) OVER (PARTITION BY user ORDER BY sequence) as timestamp
from (
    select user, sequence, max(timestamp) as max_timestamp
    from table
    group by user, sequence) q1;

Just be aware of reserved words suchas table, user, timestamp etc.

Edit: Yes, forget about this answer, wasn't attentive enough about required output. Mikhail got it right!

Edgars T.
  • 947
  • 8
  • 14