1

I have some minute data for the stock SRE, that is a csv file. I have imported it and created a dataframe using the data. I then created a rolling 20 day moving average but I am getting an error. The code is:

import pandas as pd
ticker = 'SRE'
df = pd.read_csv('/Volumes/Seagate Portable/S&P 500 List/{}.txt'.format(ticker))
df.columns = ['Extra', 'Dates', 'Open', 'High', 'Low', 'Close', 'Volume']
df.drop(['Extra', 'Open', 'High', 'Volume', 'Low'], axis=1, inplace=True)
df.Dates = pd.to_datetime(df.Dates)
df.set_index('Dates', inplace=True)
df = df.between_time('9:30', '16:00')
df[f'MA {ticker}'] = df.rolling('20d').mean()

Though I am getting the error:

ValueError                                Traceback (most recent call last)
<ipython-input-51-2c18a3d320d8> in <module>
      6 df.set_index('Dates', inplace=True)
      7 df = df.between_time('9:30', '16:00')
----> 8 df[f'MA {ticker}'] = df.rolling('20d').mean()
      9 
     10 # data[f'MA {ticker}'] = pd.Series

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in rolling(self, window, min_periods, center, win_type, on, axis, closed)
  11234             )
  11235 
> 11236         return Rolling(
  11237             self,
  11238             window=window,

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/window/rolling.py in __init__(self, obj, window, min_periods, center, win_type, axis, on, closed, **kwargs)
    111         self.win_freq = None
    112         self.axis = obj._get_axis_number(axis) if axis is not None else None
--> 113         self.validate()
    114 
    115     @property

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/window/rolling.py in validate(self)
   1897         ):
   1898 
-> 1899             self._validate_monotonic()
   1900 
   1901             # we don't allow center

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/window/rolling.py in _validate_monotonic(self)
   1938         """
   1939         if not (self._on.is_monotonic_increasing or self._on.is_monotonic_decreasing):
-> 1940             self._raise_monotonic_error()
   1941 
   1942     def _raise_monotonic_error(self):

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/window/rolling.py in _raise_monotonic_error(self)
   1944         if self.on is None:
   1945             formatted = "index"
-> 1946         raise ValueError(f"{formatted} must be monotonic")
   1947 
   1948     def _validate_freq(self):

ValueError: index must be monotonic
benito.cano
  • 797
  • 1
  • 8
  • 23
  • always put full error message (starting at word "Traceback") in question (not comment) as text (not screenshot). There are other useful information. – furas Jan 03 '21 at 23:05
  • the error is pretty clear here - the index (df.Dates) is apparently not monotonic- you should check your data for that – Asish M. Jan 03 '21 at 23:31
  • What does it mean that it is monotonic? (sorry I am new to this) And its there a way to change that? – benito.cano Jan 03 '21 at 23:56
  • monotonic means the values are either always decreasing or always increasing – Asish M. Jan 04 '21 at 00:21
  • I am confused as to why they have to be always increasing or decreasing if they are prices of stock. Shouldnt they be able to be any number? Is there a way to make it monotonic. This data is part a large set and it is the one of 2 companies that have this issue. How can I fi that, or do I just have to remove these stock csv files – benito.cano Jan 04 '21 at 00:24
  • it's not the stock prices that need to be monotonic - but rather the index (which is presumably a timestamp field) - I guess if you're rolling over every 20 days pandas needs the dates to be sorted. – Asish M. Jan 04 '21 at 00:29
  • Is there a function that pandas has that will sort the data. Or is there a way to sort the data somehow to fix this that you know of? – benito.cano Jan 04 '21 at 00:52

1 Answers1

2

df.sort_index(inplace = True) right before the df.rolling('20D') line should do the trick.

The .rolling() function on a TimeSeries requires the rows to be ordered by time(-index), i. e. the time(-index) should be steadly increasing. This is done to prevent huge computational costs which the user does not expect. Simply sorting by index beforehand meets that requirement.

Dames
  • 776
  • 3
  • 11