SQL/BIGQUERY Running Average with GAPs in Dates

Question

I'm having trouble with a moving average in BigQuery/SQL, I have table 'SCORES' and I need to make a 30d moving average while grouping the data using users, the problem is my dates aren't sequential, e.g there are gaps in it.

Below is my current code:

SELECT user, date,
      AVG(score) OVER (PARTITION BY user ORDER BY date)
FROM SCORES;

I don't know how to add the date restrictions into that line or if this is even possible.

My current table looks like this, but of course with a lot more users:

user    date    score
AA  13/02/2018  2.00
AA  15/02/2018  3.00
AA  17/02/2018  4.00
AA  01/03/2018  5.00
AA  28/03/2018  6.00

Then I need it to become, this:

user    date    score   30D Avg
AA  13/02/2018  2.00    2.00
AA  15/02/2018  3.00    2.50
AA  17/02/2018  4.00    3.00
AA  01/03/2018  5.00    3.50
AA  28/03/2018  6.00    5.50

Where in the last row, it's only measuring backward one because of the date (up to 30D backwards) is there any way to implement this in SQL or am I asking for too much?

Gordon Linoff · Accepted Answer · 2018-09-22T14:12:36.710

5

You want to use range between. For this, you need an integer, so:

select s.*,
       avg(score) over (partition by user
                        order by days
                        range between 29 preceding and current row
                       ) as avg_30day
from (select s.*, date_diff(s.date, date('2000-01-01'), day) as days
      from scores s
     ) s;

An alternative to date_diff() is unix_date():

select s.*,
       avg(score) over (partition by user
                        order by unix_days
                        range between 29 preceding and current row
                       ) as avg_30day
from (select s.*, unix_date(s.date) as unix_days
      from scores s
     ) s;

edited Sep 22 '18 at 14:12

answered Sep 22 '18 at 13:57

Gordon Linoff

1,242,037
58
646
786

I'm sorry for being a noob, but what is t referring to inside the FROM, statement? Cheers and thank you. – bknight Sep 22 '18 at 14:10
@bknight . . . It was the table name. I missed that your question does have a table name. – Gordon Linoff Sep 22 '18 at 14:13
man, I felt dumb now, this is working flawlessly, thank you so much! – bknight Sep 22 '18 at 14:49

score 0 · Answer 2 · answered Sep 22 '18 at 15:59

Below is for BigQuery Standard SQL

#standardSQL
SELECT *,
  AVG(score) OVER (
    PARTITION BY user 
    ORDER BY UNIX_DATE(PARSE_DATE('%d/%m/%Y', date))
    RANGE BETWEEN 29 PRECEDING AND CURRENT ROW
  ) AS avg_30day 
FROM `project.dataset.scores`

You can test / play with above using dummy data from your question

#standardSQL
WITH `project.dataset.scores` AS (
  SELECT 'AA' user, '13/02/2018' date, 2.00 score UNION ALL
  SELECT 'AA', '15/02/2018', 3.00 UNION ALL
  SELECT 'AA', '17/02/2018', 4.00 UNION ALL
  SELECT 'AA', '01/03/2018', 5.00 UNION ALL
  SELECT 'AA', '28/03/2018', 6.00 
)
SELECT *,
  AVG(score) OVER (
    PARTITION BY user 
    ORDER BY UNIX_DATE(PARSE_DATE('%d/%m/%Y', date))
    RANGE BETWEEN 29 PRECEDING AND CURRENT ROW
  ) AS avg_30day 
FROM `project.dataset.scores`

result

Row user    date        score   avg_30day    
1   AA      13/02/2018  2.0     2.0  
2   AA      15/02/2018  3.0     2.5  
3   AA      17/02/2018  4.0     3.0  
4   AA      01/03/2018  5.0     3.5  
5   AA      28/03/2018  6.0     5.5

SQL/BIGQUERY Running Average with GAPs in Dates

2 Answers2