Set column name for apply result over groupby

Question

This is a fairly trivial problem, but its triggering my OCD and I haven't been able to find a suitable solution for the past half hour.

For background, I'm looking to calculate a value (let's call it F) for each group in a DataFrame derived from different aggregated measures of columns in the existing DataFrame.

Here's a toy example of what I'm trying to do:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['X', 'Y', 'X', 'Y', 'Y', 'Y', 'Y', 'X', 'Y', 'X'],
                'B': ['N', 'N', 'N', 'M', 'N', 'M', 'M', 'N', 'M', 'N'],
                'C': [69, 83, 28, 25, 11, 31, 14, 37, 14,  0],
                'D': [ 0.3,  0.1,  0.1,  0.8,  0.8,  0. ,  0.8,  0.8,  0.1,  0.8],
                'E': [11, 11, 12, 11, 11, 12, 12, 11, 12, 12]
                })

df_grp = df.groupby(['A','B'])
df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max())

What I'd like to do is assign a name to the result of apply (or lambda). Is there anyway to do this without moving lambda to a named function or renaming the column after running the last line?

`5.583333, 2.975000, 3.845455`, which is what the function returns. — MrT, Apr 22 '15 at 15:28
Essentially. Is there a way of assigning a name to the result short of defining the function? I'd prefer to use `lambda`. — MrT, Apr 22 '15 at 15:41
Actually, looking at that link again, its not exactly what I want. I need the result at the group level only, not the original DataFrame. — MrT, Apr 22 '15 at 15:52
Ah, not sure yet, but guess this works `df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max()).reset_index(name='your_col_name')` ? Basically, here you are converting your `series` result to a `dataframe` and `name`ing it. — Zero, Apr 22 '15 at 15:55
That works, and is far better than the alternatives. Do you want to submit it as an answer and I'll check it? — MrT, Apr 22 '15 at 16:01

score 50 · Accepted Answer · answered Apr 22 '15 at 16:04

50

You could convert your series to a dataframe using reset_index() and provide name='yout_col_name' -- The name of the column corresponding to the Series values

(df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max())
      .reset_index(name='your_col_name'))

   A  B  your_col_name
0  X  N   5.583333
1  Y  M   2.975000
2  Y  N   3.845455

answered Apr 22 '15 at 16:04

Zero

74,117
18
147
154

This answer worked better for me than the accepted one. – Edgar Sep 08 '20 at 15:40
If you happen to stop by again sometime - a bit of prose about what's going on with that `reset_index()` would be additionally helpful – WestCoastProjects Feb 20 '23 at 17:07
2

apparently the `name` parameter has been removed. maybe it's `names` now - given more recent `multi-index` support – WestCoastProjects Feb 20 '23 at 17:13

Alexander · Answer 2 · 2020-02-25T17:30:02.657

49

Have the lambda function return a new Series:

df_grp.apply(lambda x: pd.Series({'new_name':
                    x['C'].sum() * x['D'].mean() / x['E'].max()}))
# or df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max()).to_frame('new_name')

     new_name
A B          
X N  5.583333
Y M  2.975000
  N  3.845455

edited Feb 25 '20 at 17:30

answered Apr 22 '15 at 16:22

Alexander

105,104
32
201
196

14

This sort of thing makes me miss R. But, +1, thanks. – Owen Feb 25 '20 at 01:00
1

Not efficient. Ex:; computing certain groupby.apply went from 1'32'' up to 2'32'' if a pd.Series with name is used for each iteration as in this answer – Isaías Mar 19 '20 at 23:06

score 3 · Answer 3 · answered Feb 28 '23 at 00:11

The accepted answer seems work for the current version of Pandas, but name is not one of the parameters of reset_index according to the documentation. There is a names argument, but it serves a different purpose IMO.

Since the output of apply is a series, we can simply use pandas.Series.rename() to achive the result.

df = pd.DataFrame({'A': ['X', 'Y', 'X', 'Y', 'Y', 'Y', 'Y', 'X', 'Y', 'X'],
                'B': ['N', 'N', 'N', 'M', 'N', 'M', 'M', 'N', 'M', 'N'],
                'C': [69, 83, 28, 25, 11, 31, 14, 37, 14,  0],
                'D': [ 0.3,  0.1,  0.1,  0.8,  0.8,  0. ,  0.8,  0.8,  0.1,  0.8],
                'E': [11, 11, 12, 11, 11, 12, 12, 11, 12, 12]
                })

df_grp = df.groupby(['A','B'])
df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max()).rename("your_col_name")

The output of `apply` can be either a DataFrame or a Series ([Docs](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.apply.html)) — Squanchy, Jun 05 '23 at 21:55

Set column name for apply result over groupby

3 Answers3

Linked