-3

I'd like to generate a rolling average variable from a basketball dataset. So if the first observation is 25 points on January 1, the generated variable will show 25. If the second observation is 30 points on January 2, the variable generated will show 27.5. If the third observation is 35 points, the variable generated will show 30, etc.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
user3654703
  • 1
  • 1
  • 3

2 Answers2

3

For variable y ordered by some time t at its simplest the average of values to date is

gen yave = sum(y) / _n 

which is the cumulative sum divided by the number of observations. If there are occasional missing values, they are ignored by sum() but the denominator needs to be fixed, say

 gen yave = sum(y) / sum(y < .) 

This generalises easily to panel structure

 bysort id (t) : gen yave = sum(y) / sum(y < .) 
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
0

Here is the solution I came up with. I had to create three variables, a cumulative point total (numerator) and a running count (denominator), then divided the two variables to get player points per game:

gen player_pts = points if player[_n]!=player[_n-1]
replace player_pts=points+player_pts[_n-1] if player[_n]==player[_n-1]&[_n]!=1
by player: gen player_games= [_n]
gen ppg=player_pts/player_games
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
user3654703
  • 1
  • 1
  • 3
  • This is correct so long as there are no missing values. Note that creating a new variable for the count is not necessary. – Nick Cox Feb 15 '15 at 23:59