Resampling multidimensional data series using all columns

Question

Here's a hard one: (I think)

I have a time series-indexed dataframe with multiple columns. I need to resample the data to be every minute and calculate some metric using the columns. The issue is, the values I want to calculate for each minute, depend on the values in ALL of the columns during that minute. I have tried to use resample('T').apply(func) but it seems like the func is applied to each column separately, not receiving the other columns as input. Instead, I want every call to func to receive a dataframe inclusive of ALL columns for the time indices in that minute, then I'd be able to use use all the values from that minute, and generate a new value based on all of them.

Bonus: I actually need to do this with multiple funcs so the resulting dataframe has multiple columns, with different names than the original, where each value in each column is calculated using the values in all columns over the minute it represents.

Surely there's a way to do this without iterating through rows with the almighty pandas... any ideas ?

EXAMPLE: desired input & output are as follows: (using random data and functions since the real code is for work and can't be shared so the contents of func1, func2 don't matter, just their use of all columns):

import pandas as pd

# functions to calculate (some random calculation using all columns)
def func1(df):
    return df.iloc[:,0].sum() + df.iloc[:,1].mean()

def func2(df):
    return (df.iloc[:,1] - df.iloc[:,2]).mean()

# Dataframe defined with datetime index with precision of 10s
ind = pd.date_range(start='09:00:00', end='09:03:00', freq='10S')
df1 = pd.DataFrame({1: range(len(ind)), 2: range(1, 1+len(ind)), 3:range(2, 2+len(ind))}, index=ind)
print(df1)

# New dataframe with DatetimeIndex at frequency of 1min
new_ind = pd.date_range(start='09:00:00', end='09:03:00', freq='T')
# Different column names to original:
df2 = pd.DataFrame(columns=[4,5], index=new_ind)
for m in range(len(new_ind)-1):
    minute_strt, minute_end = new_ind[m], new_ind[m+1]
    # The following chunk to be processed includes data from ALL 
    # columns of original dataframe within a one-minute span 
    df_to_process = df1.loc[minute_strt:minute_end, :]
    df2.iloc[m+1, :] = [func1(df_to_process), func2(df_to_process)]
print(df2)

Could you, please, provide reproductable example of your data and the expected output? — Punker, Jan 28 '21 at 10:58

Resampling multidimensional data series using all columns

0 Answers0