-1

I have the following dataframe:

df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
                              'Parrot', 'Parrot'],
                   'Habitat':['Jungle', 'Jungle',
                              'Sky', 'Sky'],
                   'Tmp':['A', 'B', 'C', 'D'],
                   'Max Speed': [380., 370., 24., 26.]})

>>> df
   Animal Habitat Tmp  Max Speed
0  Falcon  Jungle   A      380.0
1  Falcon  Jungle   B      370.0
2  Parrot     Sky   C       24.0
3  Parrot     Sky   D       26.0

I am trying to add additional column "Output" which will take the value from "Tmp" based on maximum value of column "Max Speed" in a groupby done of columns "Animal" and "Habitat".

Desired output:

   Animal Habitat Tmp  Max Speed Output
0  Falcon  Jungle   A      380.0      A
1  Falcon  Jungle   B      370.0      A
2  Parrot     Sky   C       24.0      D
3  Parrot     Sky   D       26.0      D

It can be done using a groupby and then joining it in the original dataset. But is there a more efficient way to do this? Maybe using transform or something else?

not_speshal
  • 22,093
  • 2
  • 15
  • 30
DumbCoder
  • 233
  • 2
  • 9

1 Answers1

1

You can define a function taking pd.dataframe as argument:

import pandas as pd
import numpy as np

def fmax(df_):
    df_['Output'] = df_.sort_values(['Max Speed']).tail(1)['Tmp'].squeeze()
    return df_

Please note use of pandas.DataFrame.squeeze function to return scalar value. Then simply apply above function using groupby:

df.groupby(['Animal','Habitat']).apply(fmax)

The result is:

   Animal Habitat Tmp  Max Speed Output
0  Falcon  Jungle   A      380.0      A
1  Falcon  Jungle   B      370.0      A
2  Parrot     Sky   C       24.0      D
3  Parrot     Sky   D       26.0      D
ipj
  • 3,488
  • 1
  • 14
  • 18