1

I would like to resample a DataFrame with frequences of 10D but cutting the last decade always at the end of the month. ES:

print(df)
            data
index
2010-01-01  145.08
2010-01-02  143.69
2010-01-03  101.06
2010-01-04  57.63
2010-01-05  65.46
...
2010-02-24  48.06
2010-02-25  87.41
2010-02-26  71.97
2010-02-27  73.1
2010-02-28  41.43

Apply something like df.resample('10DM').mean()

           data
index
2010-01-10  97.33
2010-01-20  58.58
2010-01-31  41.43
2010-02-10  35.17
2010-02-20  32.44
2010-02-28  55.44

note that the 1st and 2nd decades are normal 10D resample, but the 3rd can be 8-9-10-11 days based on month and year.

Thanks in advance.

1 Answers1

0

Sample data (easy to check):

# df = pd.DataFrame({"value": np.arange(1, len(dti)+1)}, index=dti)
>>> df
            value
2010-01-01      1
2010-01-02      2
2010-01-03      3
2010-01-04      4
2010-01-05      5
...
2010-02-24     55
2010-02-25     56
2010-02-26     57
2010-02-27     58
2010-02-28     59

You need to create groups by (days, month, year):

grp = df.groupby([pd.cut(df.index.day, [0, 10, 20, 31]),
                  pd.Grouper(freq='M'),
                  pd.Grouper(freq='Y')])

Now you can compute the mean for each group:

out = grp['value'].apply(lambda x: (x.index.max(), x.mean())).apply(pd.Series) \
                  .reset_index(drop=True).rename(columns={0:'date', 1:'value'}) \
                  .set_index('date').sort_index()

Output result:

>>> out
            value
date
2010-01-10    5.5
2010-01-20   15.5
2010-01-31   26.0
2010-02-10   36.5
2010-02-20   46.5
2010-02-28   55.5
Corralien
  • 109,409
  • 8
  • 28
  • 52