Time-varying means in an expanding time window can be phrased otherwise as to imply the mean of all values from the start of records to the current date. You don't give a time variable so I assume data are in order and supply a time variable.
The community-contributed command rangestat
(to be installed from SSC using ssc install rangestat
) can give the mean of all values to date in this way:
clear
input X
50.735469
48.278413
42.807671
49.247854
52.20223
49.726689
50.823169
49.099351
48.949562
47.410434
end
gen t = _n
rangestat (count) X (mean) X, int(t . 0)
list
+-------------------------------------+
| X t X_count X_mean |
|-------------------------------------|
1. | 50.73547 1 1 50.73547 |
2. | 48.27841 2 2 49.506941 |
3. | 42.80767 3 3 47.273851 |
4. | 49.24785 4 4 47.767351 |
5. | 52.20223 5 5 48.654327 |
|-------------------------------------|
6. | 49.72669 6 6 48.833054 |
7. | 50.82317 7 7 49.117356 |
8. | 49.09935 8 8 49.115105 |
9. | 48.94956 9 9 49.096711 |
10. | 47.41043 10 10 48.928084 |
+-------------------------------------+
Evidently you can ignore results for small counts as you please.
The syntax is naturally explained in the help
for rangestat
: suffice it to say here that the syntax for the option -- namely interval(t . 0)
-- is three-fold:
- for the time variable
t
and two offsets
backwards as far as possible: system missing .
here means arbitrarily large
forwards just 0
In mathematical terms the mean is from time minus infinity, or as much as possible, to time 0, the present.
The count
result is the number of observations in the window with non-missing values on X
. Here as the time variable is 1 up the count is trivially the same as the time variable, but in real problems the time variable is much more likely to be a date of some kind. Unlike some other commands rangestat
doesn't have an option to insist on a minimum number of points with non-missing values in a window, but you can count how many there are and decide to ignore those based on too few data. That is left to the user here.
Incidentally, you could make a good start on this kind of problem by working out a cumulative sum and then dividing by the number of values so far. That needs care with (e.g.) gaps in data, irregularly spaced data or missing values and a virtue of rangestat
is that all such difficulties are considered.