39

Is there away to specify to the groupby() call to use the group name in the apply() lambda function?

Similar to if I iterate through groups I can get the group key via the following tuple decomposition:

for group_name, subdf in temp_dataframe.groupby(level=0, axis=0):
    print group_name

...is there a way to also get the group name in the apply function, such as:

temp_dataframe.groupby(level=0,axis=0).apply(lambda group_name, subdf: foo(group_name, subdf)

How can I get the group name as an argument for the apply lambda function?

smci
  • 32,567
  • 20
  • 113
  • 146
user1129988
  • 1,516
  • 4
  • 19
  • 32

2 Answers2

61

I think you should be able to use the nameattribute:

temp_dataframe.groupby(level=0,axis=0).apply(lambda x: foo(x.name, x))

should work, example:

In [132]:
df = pd.DataFrame({'a':list('aabccc'), 'b':np.arange(6)})
df

Out[132]:
   a  b
0  a  0
1  a  1
2  b  2
3  c  3
4  c  4
5  c  5

In [134]:
df.groupby('a').apply(lambda x: print('name:', x.name, '\nsubdf:',x))

name: a 
subdf:    a  b
0  a  0
1  a  1
name: b 
subdf:    a  b
2  b  2
name: c 
subdf:    a  b
3  c  3
4  c  4
5  c  5
Out[134]:
Empty DataFrame
Columns: []
Index: []
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 2
    Good one - how about `transform` though? – Mr_and_Mrs_D Nov 26 '18 at 02:36
  • @Mr_and_Mrs_D sorry don't understand your question, if you replace `apply` with `transform` then it does the same thing – EdChum Nov 26 '18 at 11:46
  • Thanks - so `x.name` would work with transform too? I am using transform on a groupby series and i need the key of the groupby to use in a dict - I am doing something as ugly as `df['value'] = df.groupby(['id'])['id'].transform(lambda col: id_to_value_dict[col.unique()[0]])` – Mr_and_Mrs_D Nov 26 '18 at 13:22
  • 1
    If you want the group names you can call `.groups` and from this get the keys so `df.groupby(['id']).groups.keys()` it's a bit difficult for me to answer without a concrete example and desired result to see – EdChum Nov 26 '18 at 13:45
  • Just to add to this answer: The 'name' attribute cannot be used as if it were a column - `x['name']` will fail whereas `x.name` works. You can get at the name by using the column attribute (ie. `x[]` which returns the entire `pd.Series`, or `x.iloc[0][]` to get the first element of the Series) – cbcoutinho Feb 05 '19 at 13:40
  • @Mr_and_Mrs_D @EdChum, *no*, with `transform` and `agg`, the function takes a **Series** as input, so the `.name` attribute will be the Series name (i.e column), not the group name. As explained in the [answer below](https://stackoverflow.com/a/55869091) – PlasmaBinturong Nov 07 '19 at 14:17
  • 1
    @PlasmaBinturong in the case it's a SeriesGroupB its `name` attribute points to the groupby key inside the transform - I used it that way IIRC – Mr_and_Mrs_D Nov 07 '19 at 15:13
  • Try `d=pd.DataFrame({'a': ['id1','id2','id3', 'id3'], 'b': [3,6,78,6]}); s=d.groupby('a')['b']; s.transform(lambda x: print(x.name, type(x)))` – Mr_and_Mrs_D Nov 07 '19 at 17:21
  • Is this behavior documented anywhere? I don't see it mentioned in the documentation for `pandas.DataFrame.groupby` or `pandas.core.groupby.generic.DataFrameGroupBy.apply`. – shadowtalker Apr 20 '23 at 21:58
7

For those who came looking for an answer to the question:

Including the group name in the transform function pandas python

and ended up in this thread, please read on.

Given the following input:

df = pd.DataFrame(data={'col1': list('aabccc'),
                        'col2': np.arange(6),
                        'col3': np.arange(6)})

Data:

    col1    col2    col3
0   a       0       0
1   a       1       1
2   b       2       2
3   c       3       3
4   c       4       4
5   c       5       5

We can access the group name (which is visible from the scope of the calling apply function) like this:

df.groupby('col1') \
.apply(lambda frame: frame \
       .transform(lambda col: col + 3 if frame.name == 'a' and col.name == 'col2' else col))

Output:

    col1    col2    col3
0   a       3       0
1   a       4       1
2   b       2       2
3   c       3       3
4   c       4       4
5   c       5       5

Note that the call to apply is needed in order to obtain a reference to the sub pandas.core.frame.DataFrame (i.e. frame) which holds the name attribute of the corresponding sub group. The name attribute of the argument of transform (i.e. col) refers to the column/series name.

Alternatively, one could also loop over the groups and then, within each group, over the columns:

for grp_name, sub_df in df.groupby('col1'):
    for col in sub_df:
        if grp_name == 'a' and col == 'col2':
            df.loc[df.col1 == grp_name, col] = sub_df[col] + 3

My use case is quite rare and this was the only way to achieve my goal (as of pandas v0.24.2). However, I'd recommend exploring the pandas documentation thoroughly because there most likely is an easier vectorised solution to what you may need this construct for.

emem
  • 5,588
  • 1
  • 24
  • 30
rapture
  • 403
  • 6
  • 8