Indexing a data frame after performing an operation on a grouped object and creating a variable accordingly

Question

I have a data frame that looks like this:

 Id  Category  Score
 1     M        0.2
 2     C        0.4
 2     M        0.3
 1     C        0.1
 2     M        0.3
 1     M        0.2
 1     C        0.1
 1     C        0.1
 2     C        0.4

I want to group by Id and Category, then find the max Score, and create a new variable called Category_Label whose rows are equal to the Category at max score index.

The output should look like this

 Id  Category  Score    Category_Label
 1     M        0.2          M
 2     C        0.4          C
 2     M        0.3          C
 1     C        0.1          M
 2     F        0.03         C
 1     M        0.2          M
 1     C        0.1          M  
 1     E        0.01         M
 2     C        0.4          C

In other words, the new variable 'Category_Labelshould be equal to the row ofCategory` that corresponds to the max score of all the 1s

I tried this:

 df[df['Category_Label']] == df.loc[df.groupby(['Id','Category'])['Score'].transform(lambda a: a.max())],'Category'  ]

But I am far away!! I looked into this question and this, but they are not helpful enough.

score 4 · Answer 1 · answered Aug 24 '17 at 18:11

You could use apply and get Score idxmax, use it for Category value. Then use, that for map of Id

In [1591]: df.Id.map(df.groupby('Id').apply(lambda x: x.loc[x.Score.idxmax(), 'Category']))
Out[1591]:
0    M
1    C
2    C
3    M
4    C
5    M
6    M
7    M
8    C
Name: Id, dtype: object

Details

In [1592]: df.groupby('Id').apply(lambda x: x.loc[x.Score.idxmax(), 'Category'])
Out[1592]:
Id
1    M
2    C
dtype: object

In [1593]: df['Category_Label'] = df.Id.map(
                              df.groupby('Id')
                                .apply(lambda x: x.loc[x.Score.idxmax(), 'Category']))

In [1594]: df
Out[1594]:
   Id Category  Score Category_Label
0   1        M    0.2              M
1   2        C    0.4              C
2   2        M    0.3              C
3   1        C    0.1              M
4   2        M    0.3              C
5   1        M    0.2              M
6   1        C    0.1              M
7   1        C    0.1              M
8   2        C    0.4              C

"1594"... Time to restart the REPL :p – cs95 Aug 24 '17 at 18:20 — cs95, Aug 24 '17 at 18:20

piRSquared · Accepted Answer · 2017-08-24T18:26:35.190

4

Use idxmax to find where the max positions are.
Use transform to broadcast across all indices.
Use results in loc to grab Category values
Assign to a new column

df.assign(
    Category_Label=df.loc[
        df.groupby('Id').Score.transform('idxmax'),
        'Category'
    ].values
)

   Id Category  Score Category_Label
0   1        M    0.2              M
1   2        C    0.4              C
2   2        M    0.3              C
3   1        C    0.1              M
4   2        M    0.3              C
5   1        M    0.2              M
6   1        C    0.1              M
7   1        C    0.1              M
8   2        C    0.4              C

edited Aug 24 '17 at 18:26

answered Aug 24 '17 at 18:17

piRSquared

285,575
57
475
624

@JohnGalt reading the question, OP said by `['Id', 'Category']`. However, that makes no sense. Thank you for pointing that out. – piRSquared Aug 24 '17 at 18:25
I like your solution, this is much cleaner. – Zero Aug 24 '17 at 18:28

score 1 · Answer 3 · answered Aug 24 '17 at 18:22

Or you can try without groupby

df.merge(df.sort_values(['Id','Score']).\
   drop_duplicates(['Id'],keep ='last')[['Id','Category']],on='Id',how='left').\
   rename(columns={'Category_x':'Category','Category_y':'Category_Label'})



Out[176]: 
   Id Category  Score Category_Label
0   1        M   0.20          M
1   2        C   0.40          C
2   2        M   0.30          C
3   1        C   0.10          M
4   2        F   0.03          C
5   1        M   0.20          M
6   1        C   0.10          M
7   1        E   0.01          M
8   2        C   0.40          C

Indexing a data frame after performing an operation on a grouped object and creating a variable accordingly

3 Answers3