Background
I have a monthly dataset and want to resample it to seasonal by adding monthly data.
Seasonal refers to:
(Dec,Jan,Feb), (Mar,Apr,May),(June,July,Aug,Sep),(Oct,Nov)
The Data
dti = pd.date_range("2015-12-31", periods=11, freq="M")
df = pd.DataFrame({'time':dti,
'data':np.random.rand(len(dti))})
Output:
time data
0 2015-12-31 0.466245
1 2016-01-31 0.959309
2 2016-02-29 0.445139
3 2016-03-31 0.575556
4 2016-04-30 0.303020
5 2016-05-31 0.591516
6 2016-06-30 0.001410
7 2016-07-31 0.338360
8 2016-08-31 0.540705
9 2016-09-30 0.115278
10 2016-10-31 0.950359
Code
So, I was able to do resample for other seasons except Dec, Jan, Feb (DJF). Here is what I have done for other seasons:
MAM = df.loc[df['time'].dt.month.between(3,5)].resample('Y',on='time').sum()
Since, for DJF I couldn't use between
, I used a conditional statement.
mask = (df['time'].dt.month>11) | (df['time'].dt.month<=2)
DJF = df.loc[mask].resample('3M',origin='start',on='time').sum()
The Issue
This resampling leaves my first data '2015-12-31' as it is and starts from the '2016' even though I used origin = 'start'
.
So, my questions are basically :
- How do I solve my resampling issue ?
- I feel like there must be a more straight forward and easier way to do this rather than conditional statements. Also, Is there anything similar to using
df['time'].month.between
but for index. I tried usingdf.index.month.between
but between doesn't work for int64 datetime object. I found repetitively usingdf.set_index
anddf.reset_index
quite tiresome.