1

I have a table containing date and events and users. There is event named 'A'. I want to find out how many time a particular event occurred before and after event 'A' in SQL Bigquery. The event A might appear multiple times. But it should count events only till it encounters another event A in both before and after condition.
for Example,

 User           Date             Events
    123          2018-02-14            X.Y.A
    123          2018-02-12            X.Y.B
    134          2018-02-10            Y.Z.A
    123          2018-02-11            A
    123          2018-02-01            X.Y.Z
    134          2018-02-05            X.Y.B
    134          2018-02-04            A
    123          2018-02-13            A

The output would be something like this.

User       Event    Before   After
123          A      1        1
123          A      0        1
134          A      0        1

The other condition remains same.

This question is an extension of my previous question.

See How to count number of a particular event before another event in SQL Bigquery? for details.

The event that I have to count contains a particular prefix. Means I have to check events that start with ( X.Y.then some event name). So, X.Y.SomeEvent are the events that I have to set the counter for. Any Suggestions?

VSR
  • 87
  • 2
  • 18
  • We can easily answer but if you will just asking without trying first - you will never learn. Show what you tried so far and what problem you have to make it work - by that time you should have good foundation from your previous questions – Mikhail Berlyant Feb 15 '18 at 15:31

2 Answers2

2

Below is for BigQuery Standard SQL

#standardSQL
WITH grps AS (
  SELECT user, dt, event, 
    COUNTIF(event = 'A') OVER(PARTITION BY user ORDER BY dt) grp
  FROM `project.dataset.events`
)
SELECT dt, user, event, before, after 
FROM (
  SELECT dt, user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING ) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN CURRENT ROW AND CURRENT ROW) after
  FROM grps
)
WHERE event = 'A'
-- ORDER BY user  

you can test/play with above using dummy data from your example like below

#standardSQL
WITH `project.dataset.events` AS (
  SELECT 123 user,  '2018-02-14' dt, 'X.Y.A' event UNION ALL
  SELECT 123,       '2018-02-13', 'A'     UNION ALL
  SELECT 123,       '2018-02-12', 'X.Y.B' UNION ALL
  SELECT 123,       '2018-02-11', 'A'     UNION ALL
  SELECT 123,       '2018-02-01', 'X.Y.Z' UNION ALL
  SELECT 134,       '2018-02-10', 'Y.Z.A' UNION ALL
  SELECT 134,       '2018-02-05', 'X.Y.B' UNION ALL
  SELECT 134,       '2018-02-04', 'A'     
), grps AS (
  SELECT user, dt, event, 
    COUNTIF(event = 'A') OVER(PARTITION BY user ORDER BY dt) grp
  FROM `project.dataset.events`
)
SELECT dt, user, event, before, after 
FROM (
  SELECT dt, user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING ) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN CURRENT ROW AND CURRENT ROW) after
  FROM grps
)
WHERE event = 'A'
ORDER BY user  

with result as

Row dt          user    event   before  after    
1   2018-02-11  123     A       1       1    
2   2018-02-13  123     A       1       1    
3   2018-02-04  134     A       0       1    
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0

This is a more general question. Use can use the same ideas with lag() and lead():

select userid,
       (seqnum - lag(seqnum, 1, 0) over (partition by userid, order by date) - 1) as before,
       (lead(seqnum, 1, cnt) over (partition by user_id order by date) - seqnum - 1) as after
from (select t.*,
             row_number() over (partition by userid order by date) as seqnum,
             count(*) over (partition by userid) as cnt
      from t
      where event like 'X.Y%' or event = 'A'
     ) t
where event = 'A';
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Hey Gordon, when I try to implement this, it's giving me an error that event is unrecognized on the last where condition. Can you look at the query suggest what I am doing wrong? – VSR Feb 15 '18 at 19:38
  • @VSR . . . You need to select it in the subquery if you want to use it in the outer query. – Gordon Linoff Feb 16 '18 at 04:03