0

I am trying to create a function which resamples time series data in pandas. I would like to have the option to specify the type of aggregation that occurs depending on what type of data I am sending through (i.e. for some data, taking the sum of each bin is appropriate, while for others, taking the mean is needed, etc.). For example data like these:

import pandas as pd
import numpy as np

dr = pd.date_range('01-01-2020', '01-03-2020', freq='1H')
df = pd.DataFrame(np.random.rand(len(dr)), index=dr)

I could have a function like this:

def process(df, freq='3H', method='sum'):
    r = df.resample(freq)
    if method == 'sum':
        r = r.sum()
    elif method == 'mean':
        r = r.mean()
    #...
    #more options
    #...
    return r

For a small amount of aggregation methods, this is fine, but seems like it could be tedious if I wanted to select from all of the possible choices.

I was hoping to use getattr to implement something like this post (under "Putting it to work: generalizing method calls"). However, I can't find a way to do this:

def process2(df, freq='3H', method='sum'):
    r = df.resample(freq)
    foo = getattr(r, method)
    return r.foo()

#fails with:
#AttributeError: 'DatetimeIndexResampler' object has no attribute 'foo'

def process3(df, freq='3H', method='sum'):
    r = df.resample(freq)
    foo = getattr(r, method)
    return foo(r)

#fails with:
#TypeError: __init__() missing 1 required positional argument: 'obj'

I get why process2 fails (calling r.foo() looks for the method foo() of r, not the variable foo). But I don't think I get why process3 fails.

I know another approach would be to pass functions to the parameter method, and then apply those functions on r. My inclination is that this would be less efficient? And it still doesn't allow me to access the built-in Resample methods directly.

Is there a working, more concise way to achieve this? Thanks!

Tom
  • 8,310
  • 2
  • 16
  • 36
  • 1
    Try `.resample().apply(method)` – RichieV Aug 12 '20 at 21:51
  • @RichieV Thank you! I somehow missed this when looking at [the documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.resample.Resampler.apply.html#pandas.core.resample.Resampler.apply). If you answer I will accept. – Tom Aug 12 '20 at 22:09
  • Try it first and let me know if it works, I don't know if `method` can be a string or if it has to be a reference to the actual method – RichieV Aug 12 '20 at 22:26
  • it works yes, and it is in the docs for `apply`. As far as I can tell `r.apply('sum')` is the same as `r.sum()`. By time testing they seem to be equivalent – Tom Aug 12 '20 at 23:08

1 Answers1

2

Try .resample().apply(method)

But unless you are planning some more computation inside the function, it will probably be easier to just hard-code this line.

RichieV
  • 5,103
  • 2
  • 11
  • 24