1

I have a dataframe with time indices and need to partition it into overlapping segments (say 5 days in length with 1 day offset).

Here is a sample dataframe:

import pandas as pd    
i=pd.date_range('05-01-2015', '21-05-2018', freq='D')
x=pd.DataFrame(index=i, data = np.random.randint(0, 100, len(i)))

After partitioning the indices of the segments should range respectively from 2015-05-01 to 2015-05-05, from 2015-05-02 to 2015-05-06, from 2015-05-03 to 2015-05-08 etc. I assume there should be some trick with groupby to do it, but couldn't come up with an efficient implementation.

As a reference, this is implemented in mathematica:

Partition[list,n,d] - generates sublists with offset d.

I would greatly appreciate any insight you guys can provide.

Al Guy
  • 191
  • 11
  • I'm a little confused by the date format you're using, is it `DD-MM-YYYY` or `MM-DD-YYYY`? – economy Feb 07 '20 at 22:10
  • 2
    I think you want `df.rolling(5)`? As in a rolling window of 5 rows: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html. This also has "offset" capability, for a fixed time period, if your time series is not uniform. – alkasm Feb 07 '20 at 22:12

1 Answers1

2

As @alkasm pointed out, you can perform aggregate calculations on 5-day partitions using your sample dataframe using rolling in pandas:

import pandas as pd    
i = pd.date_range('05-01-2015', '21-05-2018', freq='D')
x = pd.DataFrame(index=i, data = np.random.randint(0, 100, len(i)))

x.rolling(5).sum()

The trick with this is, it calculates the aggregate function looking backwards, so you may need to expand your index to include the four days prior. A row in your dataframe with index 2018-05-06 with a 5-row rolling() call would represent the segment 2018-05-02 to 2018-05-06 inclusive (which is 5 days).

economy
  • 4,035
  • 6
  • 29
  • 37