2

How should I use the analytics to give me a rolling average where the sample size changes?

create table MyVals  (Item_no char(10), epoch number, Yield number, Skip_Period char(1), Reset_Period char(1));

insert into MyVals values ('A00001',1705, 12, 'N','N');     /* 17.18181818  average of epochs 1705..1610 & 1607..1606 */    
insert into MyVals values ('A00001',1704, 13, 'N','N');     /* 19.45454545  average of epochs 1704..1610 & 1607..1605 */    
insert into MyVals values ('A00001',1703, 9,  'N','N');     /* 20.36363636  average of epochs 1703..1610 & 1607..1604 */    
insert into MyVals values ('A00001',1702, 11, 'N','N');     /* 21.5     average of epochs 1702..1610 & 1607..1604 */    
insert into MyVals values ('A00001',1701, 4,  'N','N');     /* 22.66666667  average of epochs 1701..1610 & 1607..1604 */    
insert into MyVals values ('A00001',1613, 16, 'N','N');     /* 25       average of epochs 1613..1610 & 1607..1604 */    
insert into MyVals values ('A00001',1612, 33, 'N','N');     /* 26.28571429  average of epochs 1612..1610 & 1607..1604 */    
insert into MyVals values ('A00001',1611, 2,  'N','N');     /* 25.16666667  average of epochs 1611..1610 & 1607..1604 */    
insert into MyVals values ('A00001',1610, 1,  'N','N');     /* 29.8     average of epochs 1610 & 1607..1604       */    
insert into MyVals values ('A00001',1609, 66, 'Y','N');     /* 37       average of epochs 1607..1604              */    
insert into MyVals values ('A00001',1608, 23, 'Y','N');     /* 37       average of epochs 1607..1604              */    
insert into MyVals values ('A00001',1607, 22, 'N','N');     /* 37       average of epochs 1607..1604              */    
insert into MyVals values ('A00001',1606, 66, 'N','N');     /* 42       average of epochs 1606..1604              */    
insert into MyVals values ('A00001',1605, 37, 'N','N');     /* 30       average of epochs 1605..1604              */    
insert into MyVals values ('A00001',1604, 23, 'N','Y');     /* 23       average of epochs 1604                    */    
insert into MyVals values ('A00001',1603, 77, 'N','N');     /* 44.83333333  average of epochs 1603..1511              */    
insert into MyVals values ('A00001',1602, 15, 'N','N');     /* 38.4     average of epochs 1602..1511              */    
insert into MyVals values ('A00001',1601, 82, 'N','N');     /* 44.25    average of epochs 1601..1511              */    
insert into MyVals values ('A00001',1513, 4,  'N','N');     /* 31.66666667  average of epochs 1513..1511              */    
insert into MyVals values ('A00001',1512, 7,  'N','N');     /* 45.5     average of epochs 1512..1511              */    
insert into MyVals values ('A00001',1511, 84, 'N','N');     /* 84       average of epochs 1511                    */    

How do I get the the average Yield of the preceding maximum 13 records where Skip_Period = 'N' while Reset_Period = 'N'

So the window for the average changes depending on the value of skip_Period and Reset_Period based on these rules:

If a row has Reset_Period = 'Y', then don't go back any further than that record. If a row has Skip_period = 'Y', then exclude that period from the average sample

I can't figure how to create a range between expression that will give me the rolling average I need using analytics.

Any suggestions welcome :)

lidbanger
  • 147
  • 4
  • 10
  • The bit in the comment is what the rolling average would be for that row based on the rules of average of max 13 rows, excluding skip_period='Y' and going back no further than rest_period = 'Y' if that row falls within the 13. – lidbanger Nov 22 '16 at 16:13

1 Answers1

1

I think this is what you're after:

WITH res AS (SELECT item_no,
                    epoch_number,
                    yield,
                    skip_period,
                    reset_period,
                    SUM(CASE WHEN reset_period = 'Y' THEN 1 ELSE 0 END) OVER (PARTITION BY item_no ORDER BY epoch_number) grp
             FROM   myvals)
SELECT item_no,
       epoch_number,
       yield,
       skip_period,
       reset_period,
       grp,
       AVG(CASE WHEN skip_period = 'N' THEN yield END) OVER (PARTITION BY item_no, grp
                                                             ORDER BY epoch_number
                                                             rows 12 preceding) rolling_avg_yield
FROM   res
ORDER BY epoch_number DESC;

ITEM_NO EPOCH_NUMBER      YIELD SKIP_PERIOD RESET_PERIOD        GRP ROLLING_AVG_YIELD
------- ------------ ---------- ----------- ------------ ---------- -----------------
A00001          1705         12 N           N                     1  17.1818181818182
A00001          1704         13 N           N                     1  19.4545454545455
A00001          1703          9 N           N                     1  20.3636363636364
A00001          1702         11 N           N                     1              21.5
A00001          1701          4 N           N                     1  22.6666666666667
A00001          1613         16 N           N                     1                25
A00001          1612         33 N           N                     1  26.2857142857143
A00001          1611          2 N           N                     1  25.1666666666667
A00001          1610          1 N           N                     1              29.8
A00001          1609         66 Y           N                     1                37
A00001          1608         23 Y           N                     1                37
A00001          1607         22 N           N                     1                37
A00001          1606         66 N           N                     1                42
A00001          1605         37 N           N                     1                30
A00001          1604         23 N           Y                     1                23
A00001          1603         77 N           N                     0  44.8333333333333
A00001          1602         15 N           N                     0              38.4
A00001          1601         82 N           N                     0             44.25
A00001          1513          4 N           N                     0  31.6666666666667
A00001          1512          7 N           N                     0              45.5
A00001          1511         84 N           N                     0                84

First of all, you need to work out the groups you're averaging over. We can do this by generating values of 1 or 0 based on whether the reporting group changes or not, and then doing a running sum across those values.

Once we have that, it's just a matter of including that column in the partition by and then doing a conditional avg if the skip period is N, over the current and 12 preceding rows.

Boneist
  • 22,910
  • 1
  • 25
  • 40
  • That's almost right - the rule about the skip period is weird, which why we don't agree with the first 2 values. When there is a skip period = 'Y' that value is excluded from the average, but the sample size is also reduced. So for epoch 1705 its the average of 1705..1610 and 1607..1606. A total of 11 samples. – lidbanger Nov 22 '16 at 16:38
  • 11? Isn't it 13 - 1705..1610 and 1607..1604? Besides, it's only the top two averages that are different; the others are consistent. I think you have maybe made a slip up somewhere? – Boneist Nov 22 '16 at 16:45
  • No it's right. It's just the strange business rules :/ The skip period=Y are excluded from the average but the span from start epoch to end epoch can be no more than 13. – lidbanger Nov 23 '16 at 09:24
  • Aah, sorry; I missed that particular requirement. I take it (item_no, epoch_number) is unique? – Boneist Nov 23 '16 at 09:28
  • 1
    Ok, I've updated my answer; this time I think it's correct! You'll probably need to round the avg to get whichever number of decimal places you need the answer to be at. – Boneist Nov 23 '16 at 09:52
  • Sweet! That's perfect. Clever use of the sum to produce the grp. Thanks :) – lidbanger Nov 23 '16 at 12:05