I have a data frame with some numerical values and a date-timestamp.
What I would like to do is aggregate the data into monthly intervals outputting a max percentile value for each month.
What I have been doing so far is just using:
df = df.resample('M', on='ds').max()
Which gives me the max value for that month. However, from what I can see in my data there are usually one or two spikes in each month. The result is that by using max()
I will get that spike value - which is not correct. So I way to filter out the few high value peaks I was wondering if I could use a percentile function instead of max()
, .e.g:
np.percentile(df['y'], 99)
As far as I can see the resample
function does not provide the option to use own functions. But I might be wrong? In any case, how can this be accomplished ?