2

I have a table containing date and events and users. There is event named 'A'. I want to find out how many time a particular event occurred before and after event 'A' in Sql Bigquery. for Example,

   User           Date             Events
    123          2018-02-13            X.Y.A
    123          2018-02-12            X.Y.B
    134          2018-02-10            Y.Z.A
    123          2018-02-11            A
    123          2018-02-01            X.Y.Z
    134          2018-02-05            X.Y.B
    134          2018-02-04            A

The Output would be something like this

User       Event    Before   After
123          A      1        3
134          A      0        1

The event that I have to count contains a particular prefix. Means I have to check events that start with ( X.Y.then some event name). So, X.Y.SomeEvent are the events that I have to set counter for. Any Suggestions?

VSR
  • 87
  • 2
  • 18

2 Answers2

5

below is for BigQuery SQL

#standardSQL
SELECT user, event, before, after 
FROM (
  SELECT user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING ) after
  FROM `project.dataset.events`
)
WHERE event = 'A'
-- ORDER BY user

you can test it with dummy data in your question

#standardSQL
WITH `project.dataset.events` AS (
  SELECT 123 user, '2018-02-13' dt, 'X.Y.A' event UNION ALL
  SELECT 123, '2018-02-12', 'X.Y.B' UNION ALL
  SELECT 123, '2018-02-11', 'A' UNION ALL
  SELECT 134, '2018-02-10', 'Y.Z.A' UNION ALL
  SELECT 134, '2018-02-05', 'X.Y.B' UNION ALL
  SELECT 134, '2018-02-04', 'A' UNION ALL
  SELECT 123, '2018-02-01', 'X.Y.Z' 
)
SELECT user, event, before, after 
FROM (
  SELECT user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING ) after
  FROM `project.dataset.events`
)
WHERE event = 'A'
ORDER BY user
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Hey Mikhail, Thanks it worked for me. I just wanted to know one more thing. Here we have assumed that A occurs only one time. What if event A occurs multiple times for a user? How can we make sure that same event "X.Y" is not counted in with another A? – VSR Feb 14 '18 at 20:11
  • in your previous question you stated that `Each user has Event A only once` thus the above solution takes advantage of this fact. – Mikhail Berlyant Feb 14 '18 at 20:15
  • Yeah, but I am trying to learn this and would love to know how to achieve that. – VSR Feb 15 '18 at 14:27
  • 1
    Understood. I recommend you to post your new question so we are not limited by format of comments :o) – Mikhail Berlyant Feb 15 '18 at 14:40
1

User window functions to find the date when "A" occurs. Then use conditional aggregation to count the events before and after:

select userid,
       sum(case when date < a_date and event like 'X.Y%' then 1 else 0 end) as before,
       sum(case when date > a_date and event like 'X.Y%' then 1 else 0 end) as before
from (select t.*,
             min(case when event = 'A' then date end) over (partition by userid) as a_date
      from t
     ) t
group by userid
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Hey Gordon, I am getting this error while running the query. Can you tell me what am I doing wrong? Error: SELECT clause has mix of aggregations 'a_date' and fields 'user, name, EventDate' without GROUP BY clause – VSR Feb 14 '18 at 19:44