-1

I have a table of events which has:

  • user_id
  • event_name
  • event_time

There are event names of types: meeting_started, meeting_ended, email_sent

I want to create a query that counts the number of times an email has been send during a meeting.

UPDATE: I'm using Google BigQuery.

Example query:

SELECT
event_name,
count(distinct user_id) users,
FROM
events_table WHERE
and event_name IN ('meeting_started', 'meeting_ended')
group by 1

How can I achieve that? Thanks!

GMB
  • 216,147
  • 25
  • 84
  • 135
user2101699
  • 207
  • 3
  • 16

2 Answers2

1

You can do this in BigQuery using last_value():

Presumably, an email is send during a meeting if the most recent "meeting" event is 'meeting_started'. So, you can solve this by getting the most recent meeting event for each event and then filtering:

select et.*
from (select et.*,
             last_value(case when event_name in ('meeting_started', 'meeting_ended') then event_name end) ignore nulls) over
                 (partition by user_id order by event_time) as last_meeting_event
      from events_table et
     ) et
where event_name = 'email_sent' and last_meeting_event = 'meeting_started'
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

This reads likes some kind of gaps-and-islands problem, where an island is a meeting, and you want emails that belong to islands.

How do we define an island? Assuming that meeting starts and ends properly interleave, we can just compare the count of starts and ends on a per-user basis. If there are more starts than ends, then a meeting is in progress. Using this logic, you can get all emails that were sent during a meeting like so:

select *
from (
    select e.*,
        countif(event_name = 'meeting_started') over(partition by user_id order by event_time) as cnt_started,
        countif(event_name = 'meeting_ended'  ) over(partition by user_id order by event_time) as cnt_ended
    from events_table e
) e
where event_name = 'email_sent' and cnt_started > cnt_ended

It is unclear where you want to go from here. If you want the count of such emails, just use select count(*) instead of select * in the outer query.

GMB
  • 216,147
  • 25
  • 84
  • 135