0

I have a BigQuery table that is essentially an event trace for a given user session. I would like to partition the data by track in the example and produce a frequency distribution of events averaged over tracks.

track Event Name
1     A
1     B
1     C
2     A
3     A
3     B

So for these data the result would be

A - 100% B - 66% C - 33%

Which is to say all tracks contain event A, 66% of tracks contain event B and 33% of tracks contain event C.

I'd like to do this with analytic functions in SQL and PARTITION BY but am unsure how to proceed. Thanks!

Sean K
  • 113
  • 1
  • 7
  • Do you have a fixed number of events (ie, only A,B,C)? This would be fairly easy with CTEs and left joins, are you required to use analytic functions? – rtenha Sep 30 '19 at 22:17
  • Yes the event names are enumerated (a finite set) - I don't need to use analytic functions that was just where my relative non-expertise in SQL led me. – Sean K Sep 30 '19 at 22:19

2 Answers2

0

If you want to use partitioning, you can do it this way:

select event, 
   round( (count(track)/count(distinct event))/count(distinct event) over () * 100, 2) as perc
   from (
      select distinct
         event, 
         track
      from `yourdataset.table`
   )
group by 1

Hope it helps.

khan
  • 7,005
  • 15
  • 48
  • 70
0

Below is for BigQuery Standard SQL

#standardSQL
SELECT DISTINCT event, 
  ROUND(100 * COUNT(1) OVER(PARTITION BY event) / COUNT(DISTINCT track) OVER(), 2) frequency
FROM `project.dataset.table`  

if to apply to sample data from your question - result is

Row event   frequency    
1   A       100.0    
2   B       66.67    
3   C       33.33   
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230