2

I have a simple DataFrame Object:

df = pd.DataFrame(np.random.random_sample((5,5)))
df["col"] = ["A", "B", "C", "A" ,"B"]

#simple function
def func_apply(df,param=1):
    pd.Series(np.random(3)*param,name=str(param))

Now applying the function result in the expected DataFrame

df.groupby('col').apply(func_apply)

    1           0         1         2
col                              
A    0.928527  0.383567  0.085651
B    0.567423  0.668644  0.689766
C    0.301774  0.156021  0.222140

Is there a way to pass a parameter list to the groupby to get something like this?

#Pseudocode...
df.groupby('col').apply(func_apply, params=[1,2,10])

    1           0         1         2
par col                              
1    A    0.928527  0.383567  0.085651
1    B    0.567423  0.668644  0.689766
1    C    0.301774  0.156021  0.222140
2    A    0.526494  1.812780  1.515816
2    B    1.180539  0.527171  0.670796
2    C    1.507721  0.156808  1.695386
10   A    7.986563  5.109876  2.330171
10   B    2.096963  6.804624  2.351397
10   C    6.890758  8.079466  1.725226

Thanks a lot for any hint :)

MichaelRazum
  • 789
  • 1
  • 10
  • 26
  • So you want to "repeat" the subdataframe, right? – Willem Van Onsem Jun 28 '19 at 22:06
  • That was pseudocode. I would like to apply the function with different parameters on the data frame. So basically a combination of the first results just with different parameters.Right now I added only one, but in my usecase, there are even more. I know that this would work if the parameter was inside the DataFrame then it could be added to the group by statement.PS: hope it is clear. Just added the expected df. The values increase with a higher parameter value. – MichaelRazum Jun 28 '19 at 22:09

1 Answers1

1

IIUC,

apply allows additional paramaters. You need to pass it as keyword or positional agurments using args with tuple. How you use the passed parameters is up to your imagination. I.e, it depends on how you write you apply func to utilize them to get your desired output.

Let's take your sample data. I modified your func_apply as follows to sequential process each group using the additional params and combine them into the final output:

def func_apply(df,params=[1]):
     d = [pd.Series(np.random.random(3), name=str(par),index=['x', 'y', 'z']) for par in params]
     return pd.DataFrame(d)

Now call apply func_apply and pass [1, 2, 10] to it (I use keyword to pass params):

df.groupby('col').apply(func_apply, params=[1, 2, 10])

Out[1102]:
               x         y         z
col
A   1   0.074357  0.850912  0.652096
    2   0.307986  0.267658  0.558153
    10  0.351000  0.743816  0.192400
B   1   0.179359  0.411784  0.535644
    2   0.905294  0.696661  0.794458
    10  0.635706  0.742784  0.963603
C   1   0.020375  0.693070  0.225971
    2   0.448988  0.288206  0.715875
    10  0.980669  0.474264  0.036715

Without passing the params, apply falls back to the default:

df.groupby('col').apply(func_apply)

Out[1103]:
              x         y         z
col
A   1  0.499484  0.175008  0.331594
B   1  0.052399  0.965129  0.649668
C   1  0.053869  0.297008  0.793262
Andy L.
  • 24,909
  • 4
  • 17
  • 29