dynamic rolling average in sql

Question

Hope I can explain this properly

I'm trying to calculate a rolling average, but with a variable number of preceding rows, depending on a rolling sum of another column. Or in other words, I want to calculate the recent average of column b, over previous rows where column a sums to at least X

example data:

CREATE TEMP TABLE t (ts timestamp NOT NULL, a int, b int);

INSERT INTO t VALUES
 ('20200929 11:00',1,50)
,('20200929 12:00',3,25)
,('20200929 13:00',1,20)
,('20200929 14:00',1,19)
,('20200929 15:00',2,25)
,('20200929 16:00',1,22)
,('20200929 17:00',3,19)

Desired result, with X = 3:

ts                  a   b   row_from row_to average_b
2020-09-29 11:00    1   50  NULL     NULL   NULL
2020-09-29 12:00    3   25  1        1      50
2020-09-29 13:00    1   20  2        2      25
2020-09-29 14:00    1   19  2        3      22.5
2020-09-29 15:00    2   25  2        4      21.333
2020-09-29 16:00    1   22  4        5      22.5
2020-09-29 17:00    3   19  5        6      23.5

I'm not counting the a value from the current row, so row_to is always the previous row.

The third row just needs row 2

The fourth row needs rows 2 and 3, because row 2 isn't >=3 by itself. I don't mind going over the X value.

I can do the average over a varaible number of rows, but I don't know how to calculate the row_from required. I imagine it's easy in python, but is there a set based way to do this in TSQL?

I'm using PostgreSQL 10.5, but also could use SQLServer 2019

```LEAD``` or ```LAG``` window functions seem appropriate for this. — alexherm, Sep 30 '20 at 15:54
How are you expecting to get the row_from and row_to values when you don't have a "row_id" column on the base table? — Jason A. Long, Sep 30 '20 at 16:06
@JasonA.Long I'd use the `ROW_NUMBER()` window function, but I don't need the row column in the actual table. This is also just an example table, the actual one has a PK — Angus, Sep 30 '20 at 16:11
You can ```LAG``` on the average sum of column b to maintain a running total in each row. — alexherm, Sep 30 '20 at 18:30

score 1 · Accepted Answer · answered Sep 30 '20 at 18:15

I cannot think of how to do this without recursion.

Except for the arithmetic error for the 16:00 entry ((19 + 25)/2 = 22), this produces your desired output:

with recursive nums as (
  select *, 
         row_number() over (order by ts) as rn
    from t
), stoprns as (
  select *, rn as row_to, a as runsum
    from nums
  union all
  select s.ts, s.a, s.b, n.rn, s.row_to, s.runsum + n.a
    from stoprns s
         join nums n 
           on n.rn = s.rn - 1
          and s.runsum < 3
), ranges as(
  select n.rn, n.ts, n.a, n.b, 
         min(s.rn) as row_from, 
         s.row_to
    from nums n
         left join stoprns s
                on s.row_to = n.rn - 1
   group by s.row_to, n.rn, n.ts, n.a, n.b
)
select *, 
       (select avg(b) from nums where rn between row_from and row_to) as average_b
  from ranges
 group by rn, ts, a, b, row_from, row_to
 order by rn
;

Depending on the size of your table, this may not be practicable, performance-wise.

Working db<>fiddle.

Amazing, thank you! Recursion is still a weak point for me in SQL. I will try to study this and try to understand it. And apologies for the error in desired output, I changed the test data half way through! — Angus, Oct 01 '20 at 07:57

dynamic rolling average in sql

1 Answers1