3

I'm having trouble with a moving average in BigQuery/SQL, I have table 'SCORES' and I need to make a 30d moving average while grouping the data using users, the problem is my dates aren't sequential, e.g there are gaps in it.

Below is my current code:

SELECT user, date,
      AVG(score) OVER (PARTITION BY user ORDER BY date)
FROM SCORES;

I don't know how to add the date restrictions into that line or if this is even possible.

My current table looks like this, but of course with a lot more users:

user    date    score
AA  13/02/2018  2.00
AA  15/02/2018  3.00
AA  17/02/2018  4.00
AA  01/03/2018  5.00
AA  28/03/2018  6.00

Then I need it to become, this:

user    date    score   30D Avg
AA  13/02/2018  2.00    2.00
AA  15/02/2018  3.00    2.50
AA  17/02/2018  4.00    3.00
AA  01/03/2018  5.00    3.50
AA  28/03/2018  6.00    5.50

Where in the last row, it's only measuring backward one because of the date (up to 30D backwards) is there any way to implement this in SQL or am I asking for too much?

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
bknight
  • 85
  • 7

2 Answers2

5

You want to use range between. For this, you need an integer, so:

select s.*,
       avg(score) over (partition by user
                        order by days
                        range between 29 preceding and current row
                       ) as avg_30day
from (select s.*, date_diff(s.date, date('2000-01-01'), day) as days
      from scores s
     ) s;

An alternative to date_diff() is unix_date():

select s.*,
       avg(score) over (partition by user
                        order by unix_days
                        range between 29 preceding and current row
                       ) as avg_30day
from (select s.*, unix_date(s.date) as unix_days
      from scores s
     ) s;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Below is for BigQuery Standard SQL

#standardSQL
SELECT *,
  AVG(score) OVER (
    PARTITION BY user 
    ORDER BY UNIX_DATE(PARSE_DATE('%d/%m/%Y', date))
    RANGE BETWEEN 29 PRECEDING AND CURRENT ROW
  ) AS avg_30day 
FROM `project.dataset.scores` 

You can test / play with above using dummy data from your question

#standardSQL
WITH `project.dataset.scores` AS (
  SELECT 'AA' user, '13/02/2018' date, 2.00 score UNION ALL
  SELECT 'AA', '15/02/2018', 3.00 UNION ALL
  SELECT 'AA', '17/02/2018', 4.00 UNION ALL
  SELECT 'AA', '01/03/2018', 5.00 UNION ALL
  SELECT 'AA', '28/03/2018', 6.00 
)
SELECT *,
  AVG(score) OVER (
    PARTITION BY user 
    ORDER BY UNIX_DATE(PARSE_DATE('%d/%m/%Y', date))
    RANGE BETWEEN 29 PRECEDING AND CURRENT ROW
  ) AS avg_30day 
FROM `project.dataset.scores` 

result

Row user    date        score   avg_30day    
1   AA      13/02/2018  2.0     2.0  
2   AA      15/02/2018  3.0     2.5  
3   AA      17/02/2018  4.0     3.0  
4   AA      01/03/2018  5.0     3.5  
5   AA      28/03/2018  6.0     5.5  
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230