10

Im trying to retrieve only the max values (including the multi index values) from a pandas dataframe that has multiple indexes. The dataframe I have is generated via a groupby and column selection ('tOfmAJyI') like this:

df.groupby('id')['tOfmAJyI'].value_counts()

Out[4]: 
id     tOfmAJyI
3      mlNXN       4
       SSvEP       2
       hCIpw       2
5      SSvEP       2
       hCIpw       1
       mlNXN       1
11     mlNXN       2
       SSvEP       1
...

What I would like to achieve is to get the max values including their corresponding index values. So something like:

id     tOfmAJyI
3      mlNXN       4
5      SSvEP       2
11     mlNXN       2
...

Any ideas how I can achieve this? I was able to get the id and max value but I'm still trying to get the corresponding value of 'tOfmAJyI'.

cmaher
  • 5,100
  • 1
  • 22
  • 34

4 Answers4

11

groupby + head

df.groupby(level=0).head(1)
Out[1882]: 
id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: V, dtype: int64

Or

df.loc[df.groupby(level=0).idxmax()]
Out[1888]: 
id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: V, dtype: int64
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Yes, first solution worked for me. 2nd one throws an error : NotImplementedError: Indexing a MultiIndex with a DataFrame key is not implemented. – Ramon Ankersmit Feb 24 '18 at 20:13
  • @RamonAnkersmit then you can use 1st method ,multiple index always come with some funny error – BENY Feb 24 '18 at 22:43
  • 2
    The first solution doesn't return MAX, it returns just the 1st value which may not always be the max in the set. – Gary Frewin Jul 16 '21 at 11:00
  • I second what @GaryFrewin says, the first answer is just wrong I think – ffgg Feb 07 '22 at 23:23
  • 1
    @GaryFrewin in this case , since he do value_counts which return the ordered series , so the first solution will work only for this, in general case we can add sort_values – BENY Feb 08 '22 at 00:24
  • 1
    @ffgg see above ~ – BENY Feb 08 '22 at 00:24
4

I can't understand why the practical solution for this problem isn't mentioned anywhere!?

Just do this:

For DataFrame DF with Keys KEY1,KEY2 where you want the max value for every KEY1, including KEY2:

DF.groupby('KEY1').apply(lambda x: x.max())

And you'll get the maximum for each KEY1 INCLUDING the Information which KEY2 holds the maximum, relative to each KEY1.

  • 1
    this does not use the multiindex, in fact it ignores it and it would not work without a .reset_index() call first – ffgg Feb 07 '22 at 23:23
0

I had a similar question and I don't think currently this question has a good answer.

My solution was this, I think it's cleaner:

df.groupby(level=0).nlargest(1)

this keeps the multiindex object and doesn't need a lambda function

ffgg
  • 134
  • 1
  • 8
0

If you don't have previously sorted values, I think the best general answer is this variation of the one by ffggk which avoids duplicate index.

df.groupby(level=0, group_keys=False).nlargest(1)

Example:

>> df

id  tOfmAJyI
3   mlNXN       4
    SSvEP       2
    hCIpw       2
5   SSvEP       2
    hCIpw       1
    mlNXN       1
11  mlNXN       2
    SSvEP       1
Name: val, dtype: int64


>> df.groupby(level=0, group_keys=False).nlargest(1)

id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: val, dtype: int64
Bertil Johannes Ipsen
  • 1,656
  • 1
  • 14
  • 27