46

This is a fairly trivial problem, but its triggering my OCD and I haven't been able to find a suitable solution for the past half hour.

For background, I'm looking to calculate a value (let's call it F) for each group in a DataFrame derived from different aggregated measures of columns in the existing DataFrame.

Here's a toy example of what I'm trying to do:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['X', 'Y', 'X', 'Y', 'Y', 'Y', 'Y', 'X', 'Y', 'X'],
                'B': ['N', 'N', 'N', 'M', 'N', 'M', 'M', 'N', 'M', 'N'],
                'C': [69, 83, 28, 25, 11, 31, 14, 37, 14,  0],
                'D': [ 0.3,  0.1,  0.1,  0.8,  0.8,  0. ,  0.8,  0.8,  0.1,  0.8],
                'E': [11, 11, 12, 11, 11, 12, 12, 11, 12, 12]
                })

df_grp = df.groupby(['A','B'])
df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max())

What I'd like to do is assign a name to the result of apply (or lambda). Is there anyway to do this without moving lambda to a named function or renaming the column after running the last line?

JJJ
  • 1,009
  • 6
  • 19
  • 31
MrT
  • 704
  • 1
  • 8
  • 21
  • What is your expected output for the toy data? – Zero Apr 22 '15 at 15:24
  • `5.583333, 2.975000, 3.845455`, which is what the function returns. – MrT Apr 22 '15 at 15:28
  • Like http://stackoverflow.com/a/29778475/2137255 ? – Zero Apr 22 '15 at 15:33
  • Essentially. Is there a way of assigning a name to the result short of defining the function? I'd prefer to use `lambda`. – MrT Apr 22 '15 at 15:41
  • Actually, looking at that link again, its not exactly what I want. I need the result at the group level only, not the original DataFrame. – MrT Apr 22 '15 at 15:52
  • Ah, not sure yet, but guess this works `df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max()).reset_index(name='your_col_name')` ? Basically, here you are converting your `series` result to a `dataframe` and `name`ing it. – Zero Apr 22 '15 at 15:55
  • That works, and is far better than the alternatives. Do you want to submit it as an answer and I'll check it? – MrT Apr 22 '15 at 16:01

3 Answers3

50

You could convert your series to a dataframe using reset_index() and provide name='yout_col_name' -- The name of the column corresponding to the Series values

(df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max())
      .reset_index(name='your_col_name'))

   A  B  your_col_name
0  X  N   5.583333
1  Y  M   2.975000
2  Y  N   3.845455
Zero
  • 74,117
  • 18
  • 147
  • 154
49

Have the lambda function return a new Series:

df_grp.apply(lambda x: pd.Series({'new_name':
                    x['C'].sum() * x['D'].mean() / x['E'].max()}))
# or df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max()).to_frame('new_name')

     new_name
A B          
X N  5.583333
Y M  2.975000
  N  3.845455
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • 14
    This sort of thing makes me miss R. But, +1, thanks. – Owen Feb 25 '20 at 01:00
  • 1
    Not efficient. Ex:; computing certain groupby.apply went from 1'32'' up to 2'32'' if a pd.Series with name is used for each iteration as in this answer – Isaías Mar 19 '20 at 23:06
3

The accepted answer seems work for the current version of Pandas, but name is not one of the parameters of reset_index according to the documentation. There is a names argument, but it serves a different purpose IMO.

Since the output of apply is a series, we can simply use pandas.Series.rename() to achive the result.

df = pd.DataFrame({'A': ['X', 'Y', 'X', 'Y', 'Y', 'Y', 'Y', 'X', 'Y', 'X'],
                'B': ['N', 'N', 'N', 'M', 'N', 'M', 'M', 'N', 'M', 'N'],
                'C': [69, 83, 28, 25, 11, 31, 14, 37, 14,  0],
                'D': [ 0.3,  0.1,  0.1,  0.8,  0.8,  0. ,  0.8,  0.8,  0.1,  0.8],
                'E': [11, 11, 12, 11, 11, 12, 12, 11, 12, 12]
                })

df_grp = df.groupby(['A','B'])
df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max()).rename("your_col_name")
Sudip Sinha
  • 65
  • 1
  • 7
  • The output of `apply` can be either a DataFrame or a Series ([Docs](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.apply.html)) – Squanchy Jun 05 '23 at 21:55