3

I am trying to perform a window function on a data-set in Redshift using days an an interval for the preceding rows. Example data:

date        ID      score
3/1/2017    123     1
3/1/2017    555     1
3/2/2017    123     1
3/3/2017    555     3
3/5/2017    555     2

SQL window function for avg score from the last 3 scores:

select
      date,
      id,
      avg(score) over 
         (partition by id order by date rows 
              between preceding 3 and 
                      current row) LAST_3_SCORES_AVG,
from DATASET

Result:

date        ID      LAST_3_SCORES_AVG
3/1/2017    123     1
3/1/2017    555     1
3/2/2017    123     1
3/3/2017    555     2
3/5/2017    555     2

Problem is that I would like the average score from the last 3 DAYS (moving average) and not the last three tests. I have gone over the Redshift and Postgre Documentation and can't seem to find any way of doing it.

Desired Result:

date        ID      3_DAY_AVG
3/1/2017    123     1
3/1/2017    555     1
3/2/2017    123     1
3/3/2017    555     2
3/5/2017    555     2.5

Any direction would be appreciated.

AYR
  • 1,139
  • 3
  • 14
  • 24

2 Answers2

1

You can use lag() and explicitly calculate the average.

select t.*,
       (score +
        (case when lag(date, 1) over (partition by id order by date) >=
                   date - interval '2 day'
              then lag(score, 1) over (partition by id order by date)
              else 0
         end) +
        (case when lag(date, 2) over (partition by id order by date) >=
                   date - interval '2 day'
              then lag(score, 2) over (partition by id order by date)
              else 0
         end)
        )
       ) /
       (1 +
        (case when lag(date, 1) over (partition by id order by date) >=
                   date - interval '2 day'
              then 1
              else 0
         end) +
        (case when lag(date, 2) over (partition by id order by date) >=
                   date - interval '2 day'
              then 1
              else 0
         end)
       )
from dataset t;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    Firstly, thank you! Read you book, good read! I was hoping that there would be a dynamic answer that wouldn't require a hardcoded solution for 3 days specifically. – AYR Mar 22 '17 at 12:13
1

The following approach could be used instead of the RANGE window option in a lot of (or all) cases. You can introduce "expiry" for each of the input records. The expiry record would negate the original one, so when you aggregate all preceding records, only the ones in the desired range will be considered. AVG is a bit harder as it doesn't have a direct opposite, so we need to think of it as SUM/COUNT and negate both.

SELECT id, date, running_avg_score
FROM
(
    SELECT id, date, n,
            SUM(score) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
            / NULLIF(SUM(n) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0) as running_avg_score
    FROM
    (
        SELECT date, id, score, 1 as n   
        FROM DATASET
        UNION ALL
        -- expiry and negate
        SELECT DATEADD(DAY, 3, date), id, -1 * score, -1
        FROM DATASET
    )
) a
WHERE a.n = 1
Hoodahelll
  • 21
  • 1
  • 3