1

I’m trying to calculate a 3 month rolling average grouped by region and month, as in

Region  Month                 Avg(var_a)    Avg(var_b)
Northland   Dec-Jan-Feb       7.1           5.9
Southland   Dec-Jan-Feb       7.2               6.1
Northland   Nov-Dec-Jan       7.4           6.1
Southland   Nov-Dec-Jan       7.5           6.2
Northland   Oct-Nov-Dec       7.5               6.2
Southland   Oct-Nov-Dec       7.5           6.1

Note that month is expanded for illustrative purposes, I’d really expect the output to just say a single month.

Now I can do this by creating a CTE grouping by region and month, then joining to it a couple times like

With month_rollup_cte as
    (Select region,month,sum(var_a) a_sum, sum(var_b) b_sum, count(1) cnt
From vw_score_by_region
    Group by region,month)
Select c1.region, c1.month,sum(c1.a_sum + c2.a_sum + c3.a_sum) / sum(c1.cnt + c2.cnt + c3.cnt) a_avg, sum(c1.b_sum + c2.b_sum + c3.b_sum) / sum(c1.cnt + c2.cnt + c3.cnt) b_avg
From month_rollup_cte c1
Join month_rollup_cte c2 on c1.region = c2. Region and c1.month = dateadd(mm,1,c2.month)
Join month_rollup_cte c3 on c1.region = c3. Region and c1.month = dateadd(mm,2,c3.month)
Group by c1.region, c1.month;

But that’s ugly, imagine if you had to do a 6 month rolling average or 12 month rolling average… I’m trying to use the t-sql 2012 analytic functions, specifically the RANGE option. I’ve used ROWS preceding before, but never range.

What I tried was

select region,avg(var_a) OVER (order by (year(entry_month) * 100 + month(entry_month)) range between 2 preceding and 1 following)    
from [dbo].[vw_score_by_region]
group by region

But I get a syntax error:

*Msg 8120, Level 16, State 1, Line 2
Column 'dbo.vw_score_by_region.month' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.*

Clearly I'm doing something silly, but I'm not sure what.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
corrin
  • 63
  • 1
  • 6
  • youe need to "move" that group by into the window function: `avg(var_) over (partition by region oder by ...) ` –  Mar 23 '13 at 08:41

1 Answers1

1

First of all RANGE is only supported with UNBOUNDED and CURRENT ROW frame delimiters, It cannot be used with N PRECEDING or N FOLLOWING.
From your title, looks like your want to get 3 months rolling avg (sliding avg), then you'd better to use ROWS
Using ROWS (This is more likely what you need) SQl Fiddle Demo


select region,
       avg(var_a) OVER (partition by region 
                        order by (entry_month) 
                        rows between 2 preceding and current row) as ThreeMonthSlidingAvg  
from [dbo].[vw_score_by_region]

Note:

No need to calcuate year+month, if entry_month is date or datetime, it is sortable already, thanks for Steve's correction.
Using RANGE:

select region,
       avg(var_a) OVER (partition by region,(year(entry_month) * 12 + month(entry_month))/3
order by (entry_month) range between unbounded preceding and current row) as ThreeMonthSlidingAvg
from [dbo].[vw_score_by_region]
Note: Using RANGE you have to control the partition width, since you want to agg by 3 month, and range doesn't support N PRECEDING and N FOLLOWING, it only supports following:
|  UNBOUNDED PRECEDING  | Starts the window at first row of the partition 
|  UNBOUNDED FOLLOWING  | Ends the window at last row of the partition 
|  CURRENT ROW          | Starts or Ends the window at current row 
ljh
  • 2,546
  • 1
  • 14
  • 20
  • Thanks ljh, I'll use rows rather than range with a CTE/nested query to aggregate my data to monthly so that rows works correctly. That's quite a lot nicer than my approach of joining the data to itself multiple times – corrin Mar 23 '13 at 22:52
  • 1
    Inside the OVER clause, ORDER BY entry_month should work if entry_month is a date or datetime value. I don't think there's any need to order by the messier expression year(entry_month)*12+month(entry_month), and ordering by a column provides the optimizer with the chance to use an index, if there is one on (region,entry_month). – Steve Kass Mar 26 '13 at 03:34
  • Fully agree, it is not necessary to use year, month and calculate them, it has no benifit but harm the performance, lost sargability. Update answer. – ljh Mar 26 '13 at 03:57