0

How does you tell pandas to ignore NaN values when calculating a mean? With min periods, pandas will return NaN for a number of min_periods when it encounters a single NaN.

Example:

pd.DataFrame({ 'x': [np.nan, 0, 1, 2, 3, np.nan, 5, 6, 7, 8, 9]}).rolling(3, min_periods = 3).mean()

Result:

-1  NaN
0   NaN
1   NaN
2   1.0
3   2.0
4   NaN
5   NaN
6   NaN
7   6.0
8   7.0
9   8.0

Desired Result:

-1  NaN
0   NaN
1   NaN
2   1.0
3   2.0
4   2.0
5   3.3
6   4.6
7   6.0
8   7.0
9   8.0
Test
  • 962
  • 9
  • 26

1 Answers1

2

You want to drop the np.nan first then rolling mean. Afterwards, reindex with the original index and forward fill values to fill the np.nan.

df.x.dropna().rolling(3).mean().reindex(df.index, method='pad')

0          NaN
1          NaN
2          NaN
3     1.000000
4     2.000000
5     2.000000
6     3.333333
7     4.666667
8     6.000000
9     7.000000
10    8.000000
Name: x, dtype: float64
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • The problem with this is that it must be NaN until there are 3 valid results. If some of the first values are NaN, this will fail. I have updated the example to showcase this edge case. – Test Apr 07 '22 at 22:46
  • I understand your question now. I've edited my answer. Let me know if that suffices – piRSquared Apr 07 '22 at 23:07
  • Wow, clever, thank you! If you have time, it would be great to say about how fast this is compared to the normal rolling. It would be good not to do anything that would be a problem with very large frames. – Test Apr 08 '22 at 02:09