8

When using rolling on a series that contains inf values the result contains NaN even if the operation is well defined, like min or max. For example:

import numpy as np
import pandas as pd

s = pd.Series([1, 2, 3, np.inf, 5, 6])
print(s.rolling(window=3).min())

This gives:

0    NaN
1    NaN
2    1.0
3    NaN
4    NaN
5    NaN
dtype: float64

while I expected

0    NaN
1    NaN
2    1.0
3    2.0
4    3.0
5    5.0

Computing the minimum of the series directly works as expected:

s.min()  # 1.0

What is the reason for additional NaN values being introduced?


Python 3.8.1, pandas 1.0.2

a_guest
  • 34,165
  • 12
  • 64
  • 118

1 Answers1

5

np.inf is explicitly converted to np.NaN in pandas/core/window/rolling.py

# Convert inf to nan for C funcs
inf = np.isinf(values)
if inf.any():
    values = np.where(inf, np.nan, values)

How to represent inf or -inf in Cython with numpy? gives information on why they had to do this.


You'd find the exact same behavior if you had NaN instead of np.inf. It can be difficult to get your output because min_counts will throw away those intermediate groups because they lack sufficient observations. One clean "hack" is to replace inf with the biggest value you can, which should be rather safe taking 'min'.

import numpy as np
s.replace(np.inf, np.finfo('float64').max).rolling(3).min()

#0    NaN
#1    NaN
#2    1.0
#3    2.0
#4    3.0
#5    5.0
#dtype: float64
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • Given the link to the source code I see why this result emerges but I don't understand why it is converted in the first place. `inf` is defined in the floating point standard so why would it cause any trouble? I read the question you linked but couldn't find the relevant information. – a_guest Mar 19 '20 at 23:28
  • @a_guest I don't know enough about cython to say exactly why this is done, perhaps there's a better link somewhere else I can see if I can find. – ALollz Mar 19 '20 at 23:32