0

Background

I have a monthly dataset and want to resample it to seasonal by adding monthly data.

Seasonal refers to:
(Dec,Jan,Feb), (Mar,Apr,May),(June,July,Aug,Sep),(Oct,Nov)

The Data

dti = pd.date_range("2015-12-31", periods=11, freq="M")
df = pd.DataFrame({'time':dti,
                  'data':np.random.rand(len(dti))})

Output:
        time    data
0   2015-12-31  0.466245
1   2016-01-31  0.959309
2   2016-02-29  0.445139
3   2016-03-31  0.575556
4   2016-04-30  0.303020
5   2016-05-31  0.591516
6   2016-06-30  0.001410
7   2016-07-31  0.338360
8   2016-08-31  0.540705
9   2016-09-30  0.115278
10  2016-10-31  0.950359

Code

So, I was able to do resample for other seasons except Dec, Jan, Feb (DJF). Here is what I have done for other seasons:

MAM = df.loc[df['time'].dt.month.between(3,5)].resample('Y',on='time').sum()

Since, for DJF I couldn't use between, I used a conditional statement.

mask = (df['time'].dt.month>11) | (df['time'].dt.month<=2)
DJF = df.loc[mask].resample('3M',origin='start',on='time').sum()

The Issue

This resampling leaves my first data '2015-12-31' as it is and starts from the '2016' even though I used origin = 'start'. So, my questions are basically :

  1. How do I solve my resampling issue ?
  2. I feel like there must be a more straight forward and easier way to do this rather than conditional statements. Also, Is there anything similar to using df['time'].month.between but for index. I tried using df.index.month.between but between doesn't work for int64 datetime object. I found repetitively using df.set_index and df.reset_index quite tiresome.
Moon Child
  • 102
  • 5

1 Answers1

1

Try mapping each month value to a season value then groupby resample on each season:

df['season'] = df['time'].dt.month.map({
    12: 0, 1: 0, 2: 0,
    3: 1, 4: 1, 5: 1,
    6: 2, 7: 2, 8: 2, 9: 2,
    10: 3, 11: 3
})

df = df.groupby('season').resample('Y', on='time')['data'].sum().reset_index()

df:

   season       time      data
0       0 2015-12-31  0.221993
1       0 2016-12-31  1.077451
2       1 2016-12-31  2.018766
3       2 2016-12-31  1.768848
4       3 2016-12-31  0.080741

To consider the previous December as part of the next year add MonthBegin from pandas.tseries.offsets to offset December 2015 to January 2016, then adjust all Season values forward one month:

df['time'] = df['time'] + MonthBegin(1)
df['season'] = df['time'].dt.month.map({
    1: 0, 2: 0, 3: 0,
    4: 1, 5: 1, 6: 1,
    7: 2, 8: 2, 9: 2, 10: 2,
    11: 3, 12: 3
})

df = df.groupby('season').resample('Y', on='time')['data'].sum().reset_index()

df:

   season       time      data
0       0 2016-12-31  1.299445
1       1 2016-12-31  2.018766
2       2 2016-12-31  1.768848
3       3 2016-12-31  0.080741

Sample Data Used:

np.random.seed(5)
dti = pd.date_range("2015-12-31", periods=11, freq="M")
df = pd.DataFrame({'time': dti,
                   'data': np.random.rand(len(dti))})

df:

         time      data
0  2015-12-31  0.221993
1  2016-01-31  0.870732
2  2016-02-29  0.206719
3  2016-03-31  0.918611
4  2016-04-30  0.488411
5  2016-05-31  0.611744
6  2016-06-30  0.765908
7  2016-07-31  0.518418
8  2016-08-31  0.296801
9  2016-09-30  0.187721
10 2016-10-31  0.080741
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57