0

I have a pandas dataframe with a time index like this

import pandas as pd
import numpy as np

idx = pd.date_range(start='2000',end='2001')
df = pd.DataFrame(np.random.normal(size=(len(idx),2)),index=idx)

which looks like this:

                   0            1
2000-01-01  0.565524    0.355548
2000-01-02  -0.234161   0.888384

I would like to compute a rolling average like

df_avg = df.rolling(60).mean()

but excluding always entries corresponding to (let's say) 10 days before +- 2 days. In other words, for each date df_avg should contain the mean (exponential with ewm or flat) of previous 60 entries but excluding entries from t-48 to t-52. I guess I should do a kind of a rolling mask but I don't know how. I could also try to compute two separate averages and obtain the result as a difference but it looks dirty and I wonder if there is a better way which generalize to other non-linear computations...

Many thanks!

cs95
  • 379,657
  • 97
  • 704
  • 746
stan
  • 45
  • 4

2 Answers2

3

You can use apply to customize your function:

# select indexes you want to average over
avg_idx = [idx for idx in range(60) if idx not in range(8, 13)]

# do rolling computation, calculating average only on the specified indexes
df_avg = df.rolling(60).apply(lambda x: x[avg_idx].mean())

The x DataFrame in apply will always have 60 rows, so you can specify your positional index based on this, knowing that the first entry (0) is t-60.

I am not entirely sure about your exclusion logic, but you can easily modify my solution for your case.

FLab
  • 7,136
  • 5
  • 36
  • 69
  • thanks, it looks like the right method but rolling().apply() has a twisted logic and makes use of numpy array.. your code gives me the following error: ... 'numpy.ndarray' object has no attribute 'iloc' – stan May 18 '18 at 15:23
  • you are right. I updated my solution to adjust for this (very small adjustment). Also note that in the latest version of pandas they (re)introduced a 'raw' parameter, that if set to False passes a pandas Series to apply. – FLab May 18 '18 at 15:29
0

Unfortunately, not. From pandas source code:

df.rolling(window, min_periods=None, freq=None, center=False, win_type=None, 
           on=None, axis=0, closed=None)

window : int, or offset
    Size of the moving window. This is the number of observations used for
    calculating the statistic. Each window will be a fixed size.

    If its an offset then this will be the time period of each window. Each
    window will be a variable sized based on the observations included in
    the time-period.
koPytok
  • 3,453
  • 1
  • 14
  • 29