How to compute the average event frequency over a partition in BigQuery

Question

I have a BigQuery table that is essentially an event trace for a given user session. I would like to partition the data by track in the example and produce a frequency distribution of events averaged over tracks.

track Event Name
1     A
1     B
1     C
2     A
3     A
3     B

So for these data the result would be

A - 100% B - 66% C - 33%

Which is to say all tracks contain event A, 66% of tracks contain event B and 33% of tracks contain event C.

I'd like to do this with analytic functions in SQL and PARTITION BY but am unsure how to proceed. Thanks!

Do you have a fixed number of events (ie, only A,B,C)? This would be fairly easy with CTEs and left joins, are you required to use analytic functions? — rtenha, Sep 30 '19 at 22:17
Yes the event names are enumerated (a finite set) - I don't need to use analytic functions that was just where my relative non-expertise in SQL led me. — Sean K, Sep 30 '19 at 22:19

khan · Answer 1 · 2019-09-30T23:07:59.157

0

If you want to use partitioning, you can do it this way:

select event, 
   round( (count(track)/count(distinct event))/count(distinct event) over () * 100, 2) as perc
   from (
      select distinct
         event, 
         track
      from `yourdataset.table`
   )
group by 1

Hope it helps.

edited Sep 30 '19 at 23:07

answered Sep 30 '19 at 22:22

khan

7,005
15
48
70

score 0 · Answer 2 · answered Sep 30 '19 at 22:23

Below is for BigQuery Standard SQL

#standardSQL
SELECT DISTINCT event, 
  ROUND(100 * COUNT(1) OVER(PARTITION BY event) / COUNT(DISTINCT track) OVER(), 2) frequency
FROM `project.dataset.table`

if to apply to sample data from your question - result is

Row event   frequency    
1   A       100.0    
2   B       66.67    
3   C       33.33

How to compute the average event frequency over a partition in BigQuery

2 Answers2