How to calculate time varying historical mean with Stata

Question

How can I calculate the mean of X using an expanding window with at least four observations?

Here is a numeric example:

score 1 · Accepted Answer · edited Mar 16 '19 at 20:31

Time-varying means in an expanding time window can be phrased otherwise as to imply the mean of all values from the start of records to the current date. You don't give a time variable so I assume data are in order and supply a time variable.

The community-contributed command rangestat (to be installed from SSC using ssc install rangestat) can give the mean of all values to date in this way:

clear 
input X
50.735469
48.278413
42.807671
49.247854
52.20223
49.726689
50.823169
49.099351
48.949562
47.410434
end 

gen t = _n 

rangestat (count) X (mean) X, int(t . 0) 

list 

    +-------------------------------------+
     |        X    t   X_count      X_mean |
     |-------------------------------------|
  1. | 50.73547    1         1    50.73547 |
  2. | 48.27841    2         2   49.506941 |
  3. | 42.80767    3         3   47.273851 |
  4. | 49.24785    4         4   47.767351 |
  5. | 52.20223    5         5   48.654327 |
     |-------------------------------------|
  6. | 49.72669    6         6   48.833054 |
  7. | 50.82317    7         7   49.117356 |
  8. | 49.09935    8         8   49.115105 |
  9. | 48.94956    9         9   49.096711 |
 10. | 47.41043   10        10   48.928084 |
     +-------------------------------------+

Evidently you can ignore results for small counts as you please.

The syntax is naturally explained in the help for rangestat: suffice it to say here that the syntax for the option -- namely interval(t . 0) -- is three-fold:

for the time variable t

and two offsets

backwards as far as possible: system missing . here means arbitrarily large
forwards just 0

In mathematical terms the mean is from time minus infinity, or as much as possible, to time 0, the present.

The count result is the number of observations in the window with non-missing values on X. Here as the time variable is 1 up the count is trivially the same as the time variable, but in real problems the time variable is much more likely to be a date of some kind. Unlike some other commands rangestat doesn't have an option to insist on a minimum number of points with non-missing values in a window, but you can count how many there are and decide to ignore those based on too few data. That is left to the user here.

Incidentally, you could make a good start on this kind of problem by working out a cumulative sum and then dividing by the number of values so far. That needs care with (e.g.) gaps in data, irregularly spaced data or missing values and a virtue of rangestat is that all such difficulties are considered.

How to calculate time varying historical mean with Stata

1 Answers1