0

Suppose I have the data generated by this:

clear all
set seed 100
set obs 36
egen group = seq(), from(1) to(2) block(18)
egen year = seq(), from(2000) to(2005) block(3)
egen month = seq(), from(1) to(3)
gen y = round(runiform()*10)
sort group month year
replace y = . in 3 
replace y = . in 7 
replace y = . in 11
replace y = . in 19
replace y = . in 28

We will focus on the first 6 observations for expository purposes:

group   year    month   y
1       2000    1       10
1       2001    1       1
1       2002    1       
1       2003    1       9
1       2004    1       5
1       2005    1       6

What I want to do is to use egen to create a moving average of y. In other words, take the average based on the previous 3 years before the current year (including current year); if the year is not in the data, do not use that year. For year 2000, the moving average is 10. We want to ignore missings in the calculation; but only ever go back 3 years. For the row corresponding to year 2005, it would be (20/3). For2004, it would be5(and not10/3`).

Here is some incorrect code to try to get at this.

bys group month: egen avg = mean(temp) if year>year[_n]-3 & year<=year[_n]

This produces missing values everywhere. What I am trying to do is to calculate a separate number for each month day, but have this use data from the whole bysort group, assuming that the data meet the criteria of being 3 years back.

In my line of incorrect code, in the first group month group, I want it to start at obs. 1. It should compute the average for all observations in value of year is greater than 1997 and less than or equal to 2000. In this case, it is only the first observation.

Then it goes to observation 2. It uses 2001 for the values of year[_n] and computes the average based on the first two observations, because these are the ones that meet the criteria.

Is what I am trying to describe possible using egen? This is a general question that extends beyond the moving average application.

Also, if it is not possible, then is the following the best solution to calculate the moving average (once again only going back 3 years and ignoring missings in the calculation)?:

sort group month year
forvalues i = 1/3 {
    bys group: gen y_`i' = y[_n-`i']
}

bys group month: egen avg = mean(y) if year>year[_n]
egen ma_3 = rowmean(y y_1 y_2 y_3)
bill999
  • 2,147
  • 8
  • 51
  • 103
  • 1
    Quick advice: From `help egen`, we have "Explicit subscripting (using _N and _n), which is commonly used with `generate`, should not be used with `egen`". – Roberto Ferrer Oct 10 '15 at 02:33

1 Answers1

1

You can use tsegen (from SSC) to calculate statistics over a rolling window of time. I'm not sure I understand how you group your observations since you have a month variable but the following appears to do what you are looking for:

clear all
set seed 100
set obs 36
egen group = seq(), from(1) to(2) block(18)
egen year = seq(), from(2000) to(2005) block(3)
egen month = seq(), from(1) to(3)
gen y = round(runiform()*10)
sort group month year
replace y = . in 3 
replace y = . in 7 
replace y = . in 11
replace y = . in 19
replace y = . in 28

* create a panel variable by grouping the group and month variable
isid group month year, sort
egen group_month = group(group month)

* declare data to be a time-series
tsset group_month year

* calculate a moving average over 3 years
tsegen avg = rowmean(L(0/2).y)
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
Robert Picard
  • 1,051
  • 6
  • 9
  • Works great! Minor point, but to match my description (where I want to take the moving average over this year and the previous 3 years), I think the last line of code should be `tsegen avg = rowmean(L(0/3).y)` instead of `tsegen avg = rowmean(L(0/2).y)` – bill999 Oct 10 '15 at 13:19