0

I have data that i want to resample use end of month based on original df but when i use df.resample('M').last(). the end of month date that i got is different from original df. see the asterix marks. 2005-12-31 should be >> 2005-12-29. any suggestion ? what parameter should i add into .resample() ?

orginal df =

DATE
2005-12-27    1161.707
2005-12-28    1164.143
*2005-12-29    1162.635*
2006-01-02    1171.709
2006-01-03    1184.690
2006-01-04    1211.699

test_resample = df.resample('M').last()

DATE
2005-11-30    1096.641
*2005-12-31    1162.635*
2006-01-31    1232.321
stvlam22
  • 63
  • 7

2 Answers2

2

You can't directly with resample, you should instead groupby.agg after temporarily resetting the index:

(df.reset_index()
   .groupby(df.index.to_period('M'))
   .agg({'DATE': 'last', 'value': 'last'})
   .set_index('DATE')
)

Output:

               value
DATE                
2005-12-29  1162.635
2006-01-04  1211.699
mozway
  • 194,879
  • 13
  • 39
  • 75
  • HI mozway,, sorry for keep asking if i have multiple columns. what should i do ? cols = df.columns.tolist(), test = df.reset_index().groupby(df.index.to_period('M')).agg({'DATE': 'last', cols: 'last' }).set_index('DATE') – stvlam22 Nov 28 '22 at 13:02
2

Example

data = {'2005-12-27': 1161.707,
        '2005-12-28': 1164.143,
        '2005-12-29': 1162.635,
        '2006-01-02': 1171.709,
        '2006-01-03': 1184.69,
        '2006-01-04': 1211.699}
s = pd.Series(data)
s.index = pd.to_datetime(s.index)

output(s):

2005-12-27    1161.707
2005-12-28    1164.143
2005-12-29    1162.635
2006-01-02    1171.709
2006-01-03    1184.690
2006-01-04    1211.699
dtype: float64

Code

s.groupby(s.index.to_period('M')).tail(1)

output:

2005-12-29    1162.635
2006-01-04    1211.699
dtype: float64

If s is not sorted by time order, sort index

Panda Kim
  • 6,246
  • 2
  • 12