Resampling with percentiles

Question

I have a data frame with some numerical values and a date-timestamp.

What I would like to do is aggregate the data into monthly intervals outputting a max percentile value for each month.

What I have been doing so far is just using:

df = df.resample('M', on='ds').max()

Which gives me the max value for that month. However, from what I can see in my data there are usually one or two spikes in each month. The result is that by using max() I will get that spike value - which is not correct. So I way to filter out the few high value peaks I was wondering if I could use a percentile function instead of max(), .e.g:

np.percentile(df['y'], 99)

As far as I can see the resample function does not provide the option to use own functions. But I might be wrong? In any case, how can this be accomplished ?

score 2 · Accepted Answer · answered May 07 '21 at 11:58

2

Use custom lambda function in GroupBy.agg:

df = df.resample('M', on='ds')['y'].agg(lambda x: np.percentile(x, 99))

answered May 07 '21 at 11:58

jezrael

822,522
95
1,334
1,252

1

@George - I see it, not idea. – jezrael May 07 '21 at 13:01
1

This worked very well, just had to add reset_index() at the end to keep the 'y' label on that column. – Denver Dang May 07 '21 at 16:02

Resampling with percentiles

1 Answers1