Use Pandas groupby() + apply() with arguments

Question

I would like to use df.groupby() in combination with apply() to apply a function to each row per group.

I normally use the following code, which usually works (note, that this is without groupby()):

df.apply(myFunction, args=(arg1,))

With the groupby() I tried the following:

df.groupby('columnName').apply(myFunction, args=(arg1,))

However, I get the following error:

TypeError: myFunction() got an unexpected keyword argument 'args'

Hence, my question is: How can I use groupby() and apply() with a function that needs arguments?

This would work with `df.groupby('columnName').apply(myFunction, ('arg1'))` — Zero, Sep 11 '17 at 13:30
@Zero this is great answer as it is very similar to OP's attempted solution and doesn't require a lambda. I suggest you post it as an answer. — DontDivideByZero, Oct 16 '17 at 09:19
@Zero, I have the very same quetsion as the OP, but this doesn't work for me - I still get the very same error as the OP. Also, may I ask why your comment should work and why the OP's approach (which is the same as mine) doesn't? I haven't found it documented anywhere — Pythonista anonymous, Oct 16 '17 at 12:22
try `.apply(myFunction, args = ('arg1',)` note the `,`after `arg1`. — beta, Oct 17 '17 at 10:22
actually, i just tried it by myself and it doesnt work either... — beta, Oct 17 '17 at 10:29

MaxU - stand with Ukraine · Accepted Answer · 2017-10-16T13:09:19.043

59

pandas.core.groupby.GroupBy.apply does NOT have named parameter args, but pandas.DataFrame.apply does have it.

So try this:

df.groupby('columnName').apply(lambda x: myFunction(x, arg1))

or as suggested by @Zero:

df.groupby('columnName').apply(myFunction, ('arg1'))

Demo:

In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list('abc'))

In [83]: df
Out[83]:
   a  b  c
0  0  3  1
1  0  3  4
2  3  0  4
3  4  2  3
4  3  4  1

In [84]: def f(ser, n):
    ...:     return ser.max() * n
    ...:

In [85]: df.apply(f, args=(10,))
Out[85]:
a    40
b    40
c    40
dtype: int64

when using GroupBy.apply you can pass either a named arguments:

In [86]: df.groupby('a').apply(f, n=10)
Out[86]:
    a   b   c
a
0   0  30  40
3  30  40  40
4  40  20  30

a tuple of arguments:

In [87]: df.groupby('a').apply(f, (10))
Out[87]:
    a   b   c
a
0   0  30  40
3  30  40  40
4  40  20  30

edited Oct 16 '17 at 13:09

answered Apr 18 '17 at 22:41

MaxU - stand with Ukraine

205,989
36
386
419

1

Are you sure there's no way to pass an `args` parameter here in a tuple? I've seen that used on `.apply` elsewhere and it obviates the need for a lambda expression. – Brad Solomon Sep 28 '17 at 17:04
1

@BradSolomon see Zero's answer in the question comments – DontDivideByZero Oct 16 '17 at 09:24
Why does this work, while what the OP did doesn't? I'm not following, and I couldn't find it documented anywhere. – Pythonista anonymous Oct 16 '17 at 12:25
1

@Pythonistaanonymous, now you have even two answers answering your question :-D – MaxU - stand with Ukraine Oct 16 '17 at 13:10
In my case, I have a function that I need to give two consequent rows to it. How should I do it? df.groupby('columnName').apply(lambda x: myFunction(x, shift(-1))? – Mehdi Abbassi Oct 14 '20 at 13:57
2

@MehdiAbbassi, try this: `df.groupby('columnName').apply(lambda x: myFunction(x, x.shift(-1))` ;) – MaxU - stand with Ukraine Oct 14 '20 at 14:28

Brad Solomon · Answer 2 · 2017-10-16T13:08:31.400

Some confusion here over why using an args parameter throws an error might stem from the fact that pandas.DataFrame.apply does have an args parameter (a tuple), while pandas.core.groupby.GroupBy.apply does not.

So, when you call .apply on a DataFrame itself, you can use this argument; when you call .apply on a groupby object, you cannot.

In @MaxU's answer, the expression lambda x: myFunction(x, arg1) is passed to func (the first parameter); there is no need to specify additional *args/**kwargs because arg1 is specified in lambda.

An example:

import numpy as np
import pandas as pd

# Called on DataFrame - `args` is a 1-tuple
# `0` / `1` are just the axis arguments to np.sum
df.apply(np.sum, axis=0)  # equiv to df.sum(0)
df.apply(np.sum, axis=1)  # equiv to df.sum(1)


# Called on groupby object of the DataFrame - will throw TypeError
print(df.groupby('col1').apply(np.sum, args=(0,)))
# TypeError: sum() got an unexpected keyword argument 'args'

score 6 · Answer 3 · answered Dec 07 '18 at 09:48

6

For me

df2 = df.groupby('columnName').apply(lambda x: my_function(x, arg1, arg2,))

worked

answered Dec 07 '18 at 09:48

Hitesh Somani

620
4
11
16

Use Pandas groupby() + apply() with arguments

3 Answers3

Linked