Getting max values from pandas multiindex dataframe

Question

Im trying to retrieve only the max values (including the multi index values) from a pandas dataframe that has multiple indexes. The dataframe I have is generated via a groupby and column selection ('tOfmAJyI') like this:

df.groupby('id')['tOfmAJyI'].value_counts()

Out[4]: 
id     tOfmAJyI
3      mlNXN       4
       SSvEP       2
       hCIpw       2
5      SSvEP       2
       hCIpw       1
       mlNXN       1
11     mlNXN       2
       SSvEP       1
...

What I would like to achieve is to get the max values including their corresponding index values. So something like:

id     tOfmAJyI
3      mlNXN       4
5      SSvEP       2
11     mlNXN       2
...

Any ideas how I can achieve this? I was able to get the id and max value but I'm still trying to get the corresponding value of 'tOfmAJyI'.

That was a really clever way to use `value_counts()`. Thanks :) — joaoavf, Feb 23 '18 at 21:36

score 11 · Answer 1 · answered Feb 23 '18 at 21:38

11

groupby + head

df.groupby(level=0).head(1)
Out[1882]: 
id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: V, dtype: int64

Or

df.loc[df.groupby(level=0).idxmax()]
Out[1888]: 
id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: V, dtype: int64

answered Feb 23 '18 at 21:38

BENY

317,841
20
164
234

Yes, first solution worked for me. 2nd one throws an error : NotImplementedError: Indexing a MultiIndex with a DataFrame key is not implemented. – Ramon Ankersmit Feb 24 '18 at 20:13
@RamonAnkersmit then you can use 1st method ，multiple index always come with some funny error – BENY Feb 24 '18 at 22:43
2

The first solution doesn't return MAX, it returns just the 1st value which may not always be the max in the set. – Gary Frewin Jul 16 '21 at 11:00
I second what @GaryFrewin says, the first answer is just wrong I think – ffgg Feb 07 '22 at 23:23
1

@GaryFrewin in this case , since he do value_counts which return the ordered series , so the first solution will work only for this, in general case we can add sort_values – BENY Feb 08 '22 at 00:24
1

@ffgg see above ~ – BENY Feb 08 '22 at 00:24

score 4 · Answer 2 · answered Jul 02 '20 at 08:35

4

I can't understand why the practical solution for this problem isn't mentioned anywhere!?

Just do this:

For DataFrame DF with Keys KEY1,KEY2 where you want the max value for every KEY1, including KEY2:

DF.groupby('KEY1').apply(lambda x: x.max())

And you'll get the maximum for each KEY1 INCLUDING the Information which KEY2 holds the maximum, relative to each KEY1.

answered Jul 02 '20 at 08:35

Spirit_Gate

41
1

1

this does not use the multiindex, in fact it ignores it and it would not work without a .reset_index() call first – ffgg Feb 07 '22 at 23:23

score 0 · Answer 3 · answered Feb 07 '22 at 23:22

0

I had a similar question and I don't think currently this question has a good answer.

My solution was this, I think it's cleaner:

df.groupby(level=0).nlargest(1)

this keeps the multiindex object and doesn't need a lambda function

answered Feb 07 '22 at 23:22

ffgg

134
1
8

score 0 · Answer 4 · answered Oct 03 '22 at 11:56

If you don't have previously sorted values, I think the best general answer is this variation of the one by ffggk which avoids duplicate index.

df.groupby(level=0, group_keys=False).nlargest(1)

Example:

>> df

id  tOfmAJyI
3   mlNXN       4
    SSvEP       2
    hCIpw       2
5   SSvEP       2
    hCIpw       1
    mlNXN       1
11  mlNXN       2
    SSvEP       1
Name: val, dtype: int64


>> df.groupby(level=0, group_keys=False).nlargest(1)

id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: val, dtype: int64

Getting max values from pandas multiindex dataframe

4 Answers4

Linked