3

I want to calculate the max value in the past 3 rolling rows, ignoring NaN if I see them. I assumed that skipna would do that, but it doesn't. How can I ignore NaN, and also what is skipna supposed to do?

In this code

import pandas as pd

df = pd.DataFrame({'sales': [25, 20, 14]})
df['max'] = df['sales'].rolling(3).max(skipna=True)
print(df)

The last column is

   sales   max
0     25   NaN
1     20   NaN
2     14  25.0

But I want it to be

   sales   max
0     25  25.0
1     20  25.0
2     14  25.0
Cleb
  • 25,102
  • 20
  • 116
  • 151
HAL
  • 381
  • 1
  • 2
  • 13

2 Answers2

5

skipna= has the default value of True, so adding it explicitly in your code does not have any effect. If you were to set it to False, you would possibly get NaN as the max if you had NaNs in the original sales column. There is a nice explanation of why that would happen here.

In your example, you are getting those NaNs in the first two rows because the .rolling(3) call tells pandas that if there is less than 3 values in the rolling window, they are to be set to NaN. You can set the second parameter (min_periods) in the .rolling() call to require at least one value:

df['max'] = df['sales'].rolling(3,1).max()
df
#    sales   max
# 0     25  25.0
# 1     20  25.0
# 2     14  25.0
AlexK
  • 2,855
  • 9
  • 16
  • 27
1

You can also use Series.bfill with your command:

df['max'] = df['sales'].rolling(3).max().bfill()

Output:

   sales   max
0     25  25.0
1     20  25.0
2     14  25.0
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
  • I can use df.bfill(n-1).rolling(n).max() for my application. It doesn't fix the NaNs at the start of the array but my application has NaNs in the middle which were what I actually cared about. – HAL Apr 29 '21 at 16:26