0

I have a pandas dataframe df:

TypeA TypeB timepoint value
A     AB     1         10
A     AB     2         10
A     AC     1         5
A     AC     2         15
A     AC     3         10
...
D     DB     1         1
D     DB     2         1

How can I run a function several times on a the unique combinations of 'TypeA' and 'TypeB' and store the results in a new dataframe? Let's assume the following function:

import numpy as np
def running_mean(x, N):
    cumsum = np.cumsum(np.insert(x, 0, 0)) 
    return (cumsum[N:] - cumsum[:-N]) / float(N)

Normally, I would do a for-loop, but I think that is not a good idea (and I miss the savings of the functions):

df4 = pd.DataFrame()
for i in df['typeA'].unique().tolist():
  df2 = df[df['typeA'] == i]
  for j in df2['typeB'].unique().tolist():
   df3 = df2[df2['typeB'] == j]
   moving_av = running_mean(df3['Wert'].values, 2)
   df3.iloc[1:1+len(moving_av), df3.columns.get_loc('moving_av')] = moving_av
   df5 = pd.concat([df5, df3])

df = pd.merge(df, df5, how='left', on=['typeA', 'Type', 'Kontonummer', 'timepoint'])

My desired output is:

TypeA TypeB timepoint value moving_av
A     AB     1         10     NaN
A     AB     2         10     10
A     AC     1         5      NaN
A     AC     2         15     10
A     AC     3         10     12.5
...
D     DB     1         1      NaN
D     DB     2         1      1

Please note that the simple 'sum' function is only a example, I am searching for a solution for a bigger function.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
PV8
  • 5,799
  • 7
  • 43
  • 87
  • Your function modifies `df` in place and returns it, you should do one or the other ;) – mozway Aug 31 '23 at 13:40
  • 1
    Anyway, you need `df['value_sum'] = df.groupby(['TypeA', 'TypeB'])['value'].transform('sum')`. – mozway Aug 31 '23 at 13:43
  • please note that the function is only created for the questions, this is not the function I am using, it is a general approch how to handle that kind of situation with a function and a filtered dataframe – PV8 Aug 31 '23 at 13:49
  • Then you should provide a more meaningful example, but you can pass an arbitrary function to `transform`. In your updated example you filter and aggregate, is this really what you're trying to do? Because, here again, you shouldn't use a function for that, you should first assign the new "test" column, then use `groupby.transform('sum')` – mozway Aug 31 '23 at 13:55

0 Answers0