2

As you can see, the code is from the PANDAS official example, the STD of the last 3 numbers(5,5,5) should be 0, but it's not in the example.

In [1]: s = pd.Series([5,5,6,7,5,5,5])

In [2]: s.rolling(3).std()
Out[2]:
0             NaN
1             NaN
2    5.773503e-01
3    1.000000e+00
4    1.000000e+00
5    1.154701e+00
6    2.580957e-08
dtype: float64

If I reverse the array, the outcomes seem correct. I don't know why.

In [3]: s[::-1].rolling(3).std()
Out[3]:
6         NaN
5         NaN
4    0.000000
3    1.154701
2    1.000000
1    1.000000
0    0.577350
dtype: float64
Craig
  • 4,605
  • 1
  • 18
  • 28
Jake
  • 37
  • 4
  • You are dealing with floating point numbers. 2.58e-8 is effectively zero compared to all the other numbers. – Frank Yellin Jan 08 '22 at 04:09
  • And to answer your more specific question. `rolling(3).std()` is probably using a rolling algorithm to calculate the standard deviation rather recalculating it for each three digits. There is ample opportunity for small errors to creep in. – Frank Yellin Jan 08 '22 at 04:12
  • @FrankYellin Thanks for your answers. So, you are saying it is negligible? for this example or the further complex calculations? – Jake Jan 08 '22 at 04:25

1 Answers1

2

What you see is the result of small rounding errors in the floating point calculations done when calculating the standard deviation with a rolling window. In earlier versions of pandas, the code to calculate standard deviation and variance automatically caught small values and rounded them to zero. This was found to cause problems when calculating the standard deviation (or variance) for small numbers and it was decided to remove the automatic rounding. The discussion of this issue can be found in:

https://github.com/pandas-dev/pandas/issues/37051

and the change was made in:

https://github.com/pandas-dev/pandas/pull/40505

In issue 37051, they mention the need to update the documentation, but apparently this change doesn't seem to be reflected in the current online documentation.

If you want to replicate the behavior of the earlier version of pandas, you can manually set small values to 0 by finding any small values and setting them to 0.

In [10]: s_std = s.rolling(3).std()

In [11]: s_std
Out[11]:
0             NaN
1             NaN
2    5.773503e-01
3    1.000000e+00
4    1.000000e+00
5    1.154701e+00
6    2.580957e-08
dtype: float64

In [12]: s_std[s_std < 1e-7] = 0

In [13]: s_std
Out[13]:
0         NaN
1         NaN
2    0.577350
3    1.000000
4    1.000000
5    1.154701
6    0.000000
dtype: float64
Craig
  • 4,605
  • 1
  • 18
  • 28