Pandas: Rolling Mean and ignore NaN

Question

How does you tell pandas to ignore NaN values when calculating a mean? With min periods, pandas will return NaN for a number of min_periods when it encounters a single NaN.

Example:

pd.DataFrame({ 'x': [np.nan, 0, 1, 2, 3, np.nan, 5, 6, 7, 8, 9]}).rolling(3, min_periods = 3).mean()

Result:

-1  NaN
0   NaN
1   NaN
2   1.0
3   2.0
4   NaN
5   NaN
6   NaN
7   6.0
8   7.0
9   8.0

Desired Result:

-1  NaN
0   NaN
1   NaN
2   1.0
3   2.0
4   2.0
5   3.3
6   4.6
7   6.0
8   7.0
9   8.0

Can you explain what the mean of `[3, nan, 5]` should be? And where is that represented in your desired result? — piRSquared, Apr 07 '22 at 22:49
Ohh! I see. You want the mean of the last 3 non-null values. — piRSquared, Apr 07 '22 at 22:59

piRSquared · Answer 1 · 2022-04-07T23:07:24.183

2

You want to drop the np.nan first then rolling mean. Afterwards, reindex with the original index and forward fill values to fill the np.nan.

df.x.dropna().rolling(3).mean().reindex(df.index, method='pad')

0          NaN
1          NaN
2          NaN
3     1.000000
4     2.000000
5     2.000000
6     3.333333
7     4.666667
8     6.000000
9     7.000000
10    8.000000
Name: x, dtype: float64

edited Apr 07 '22 at 23:07

answered Apr 07 '22 at 20:21

piRSquared

285,575
57
475
624

The problem with this is that it must be NaN until there are 3 valid results. If some of the first values are NaN, this will fail. I have updated the example to showcase this edge case. – Test Apr 07 '22 at 22:46
I understand your question now. I've edited my answer. Let me know if that suffices – piRSquared Apr 07 '22 at 23:07
Wow, clever, thank you! If you have time, it would be great to say about how fast this is compared to the normal rolling. It would be good not to do anything that would be a problem with very large frames. – Test Apr 08 '22 at 02:09

Pandas: Rolling Mean and ignore NaN

1 Answers1