I'd like to generate a rolling average variable from a basketball dataset. So if the first observation is 25 points on January 1, the generated variable will show 25. If the second observation is 30 points on January 2, the variable generated will show 27.5. If the third observation is 35 points, the variable generated will show 30, etc.
Asked
Active
Viewed 6,116 times
-3
-
yeah i already tried that to no avail, i was hoping for an egen command or gen command – user3654703 Feb 15 '15 at 00:05
-
So show us exactly what you typed, what Stata responded, and what the problem is. – Roberto Ferrer Feb 15 '15 at 00:09
-
You should not post questions without code. If you solve a problem yourself you should post a solution. I've posted a solution to give this thread some value. – Nick Cox Feb 15 '15 at 00:41
2 Answers
3
For variable y
ordered by some time t
at its simplest the average of values to date is
gen yave = sum(y) / _n
which is the cumulative sum divided by the number of observations. If there are occasional missing values, they are ignored by sum()
but the denominator needs to be fixed, say
gen yave = sum(y) / sum(y < .)
This generalises easily to panel structure
bysort id (t) : gen yave = sum(y) / sum(y < .)

Nick Cox
- 35,529
- 6
- 31
- 47
0
Here is the solution I came up with. I had to create three variables, a cumulative point total (numerator) and a running count (denominator), then divided the two variables to get player points per game:
gen player_pts = points if player[_n]!=player[_n-1]
replace player_pts=points+player_pts[_n-1] if player[_n]==player[_n-1]&[_n]!=1
by player: gen player_games= [_n]
gen ppg=player_pts/player_games

Nick Cox
- 35,529
- 6
- 31
- 47

user3654703
- 1
- 1
- 3
-
This is correct so long as there are no missing values. Note that creating a new variable for the count is not necessary. – Nick Cox Feb 15 '15 at 23:59