Suppose I have the data generated by this:
clear all
set seed 100
set obs 36
egen group = seq(), from(1) to(2) block(18)
egen year = seq(), from(2000) to(2005) block(3)
egen month = seq(), from(1) to(3)
gen y = round(runiform()*10)
sort group month year
replace y = . in 3
replace y = . in 7
replace y = . in 11
replace y = . in 19
replace y = . in 28
We will focus on the first 6 observations for expository purposes:
group year month y
1 2000 1 10
1 2001 1 1
1 2002 1
1 2003 1 9
1 2004 1 5
1 2005 1 6
What I want to do is to use egen
to create a moving average of y
. In other words, take the average based on the previous 3 years before the current year (including current year); if the year is not in the data, do not use that year. For year 2000
, the moving average is 10
. We want to ignore missings in the calculation; but only ever go back 3 years. For the row corresponding to year 2005
, it would be (20/3). For
2004, it would be
5(and not
10/3`).
Here is some incorrect code to try to get at this.
bys group month: egen avg = mean(temp) if year>year[_n]-3 & year<=year[_n]
This produces missing values everywhere. What I am trying to do is to calculate a separate number for each month day, but have this use data from the whole bysort group, assuming that the data meet the criteria of being 3 years back.
In my line of incorrect code, in the first group month
group, I want it to start at obs. 1
. It should compute the average for all observations in value of year is greater than 1997
and less than or equal to 2000
. In this case, it is only the first observation.
Then it goes to observation 2
. It uses 2001
for the values of year[_n] and computes the average based on the first two observations, because these are the ones that meet the criteria.
Is what I am trying to describe possible using egen
? This is a general question that extends beyond the moving average application.
Also, if it is not possible, then is the following the best solution to calculate the moving average (once again only going back 3 years and ignoring missings in the calculation)?:
sort group month year
forvalues i = 1/3 {
bys group: gen y_`i' = y[_n-`i']
}
bys group month: egen avg = mean(y) if year>year[_n]
egen ma_3 = rowmean(y y_1 y_2 y_3)