0

I was trying to use roll to find mean of previous 6 days value. The following code is not ignoring NaN.

import pandas as pd
import numpy as np
import datetime
xx =pd.DataFrame(list(zip([datetime.datetime.fromtimestamp(x*60*60*24*2) for x in range(0,16,2)],[2,1,3,np.nan, 4,5,6,7])), columns=["datetime", "val"])
xx.set_index("datetime", inplace=True)
xx.rolling(str(6)+'d',1).apply(lambda x : np.nanmean(x))

The above code gives:

                     val
datetime                
1969-12-31 18:00:00  2.0
1970-01-04 18:00:00  1.5
1970-01-08 18:00:00  2.0
1970-01-12 18:00:00  NaN
1970-01-16 18:00:00  4.0
1970-01-20 18:00:00  4.5
1970-01-24 18:00:00  5.5
1970-01-28 18:00:00  6.5

However, if I remove datetime series index,

xx = pd.DataFrame([2,1,3,np.nan, 4,5,6,7],
                 columns=["val"])
yy = xx.rolling(3,1).apply(lambda x : np.nanmean(x))

the NaN is ignored:

   val
0  2.0
1  1.5
2  2.0
3  2.0
4  3.5
5  4.5
6  5.0
7  6.0

Much appreciation to any help!

Update

This is a bug and was fixed here: https://github.com/pandas-dev/pandas/pull/17156

Weiwen Gu
  • 215
  • 3
  • 13
  • `xx.rolling(str(6)+'d',1)` does not for me: `ValueError: window must be an integer`. Did you paste the correct code? – DYZ Jul 20 '17 at 03:26
  • tested on both py3.6 and py2.7 it works for me. my pandas is 0.20.3 if that helps. I think rolling on datetime is only after 0.19.0 – Weiwen Gu Jul 20 '17 at 04:21
  • This could be a bug https://github.com/pandas-dev/pandas/issues/15901 – Weiwen Gu Jul 20 '17 at 16:47

2 Answers2

1

This is confirmed as a bug and was fixed here https://github.com/pandas-dev/pandas/pull/17156

Weiwen Gu
  • 215
  • 3
  • 13
0

It would probably be better to interpolate your dataframe or you could also back or forward fill with fillna().

Try this code:

xx.interpolate(inplace=True)
yy = xx.rolling(str(6)+'d',1,).apply(lambda x : np.nanmean(x))

Tested and its working

Found Similar Question Here

Geetha Ponnusamy
  • 497
  • 3
  • 15
  • Thanks. However I can't use interpolate as it impacts desired value. Although careful interpolation can avoid it but it would be depending on what function to apply (`np.nanmean` is just used as example here) – Weiwen Gu Jul 20 '17 at 13:56