27

I am trying to calculate a new column which contains maximum values for each of several groups. I'm coming from a Stata background so I know the Stata code would be something like this:

by group, sort: egen max = max(odds) 

For example:

data = {'group' : ['A', 'A', 'B','B'],
    'odds' : [85, 75, 60, 65]}

Then I would like it to look like:

    group    odds    max
     A        85      85
     A        75      85
     B        60      65
     B        65      65

Eventually I am trying to form a column that takes 1/(max-min) * odds where max and min are for each group.

jpp
  • 159,742
  • 34
  • 281
  • 339
Vicki
  • 315
  • 1
  • 3
  • 6

3 Answers3

44

Use groupby + transform:

df['max'] = df.groupby('group')['odds'].transform('max')

This is equivalent to the verbose:

maxima = df.groupby('group')['odds'].max()
df['max'] = df['group'].map(maxima)

The transform method aligns the groupby result to the groupby indexer, so no explicit mapping is required.

jpp
  • 159,742
  • 34
  • 281
  • 339
4

Using the approach from jpp above works, but it also gives a "SettingWithCopyWarning". While this may not be an issue, I believe the code below would remove that warning:

df = df.assign(max = df.groupby('group')['odds'].transform('max')).values
BPC
  • 109
  • 6
  • 1
    Be careful, you are assigning a NumPy array (`values` attribute of a dataframe) to `df`. I don't think that's what you want. – jpp Oct 14 '20 at 10:11
  • I had to convert the NumPy Array to DF again. Still, this was a much faster solution. – Jinto Lonappan Oct 27 '20 at 16:21
0
df['max'] = df.group_col.map(lambda x: df.groupby('group_col').odds.max()[x])
toniitony
  • 93
  • 1
  • 10
  • 2
    It will be better if you can explain a bit your answer. Only code is not acceptable on SO. – Adnan Umer May 12 '17 at 12:28
  • The lambda function does a groupby on group_col and returns the maximum values of the odds column in each group. The indices of these returned values are the name of the group they belong to. So for each element in group_col, we map the appropriate maximum value by doing (lambda x (the group name): groupby_returns_max_values [x]). – toniitony Jun 10 '17 at 08:02
  • A `lambda` function isn't necessary here; you *could* use a series mapping directly. But, better, use `groupby` + `transform` (as show in another answer). – jpp Jan 09 '19 at 19:54
  • 2
    The answer from @jpp is much faster for large dataframes – nick Jan 12 '19 at 21:54