Pandas Return top N from each group in MultiIndex

Question

I have a large multi-index dataframe with about 3 million data points. The first index is the date, the second is the name of the object. Then, each object has a value like so:

df = pd.DataFrame({'year': [2012, 2012, 2012, 2013, 2013, 2013],
'item':['apples', 'metals', 'water', 'apples', 'metals', 'water'],
'value': [10, 14, 16, 17, 13, 14]})

df.set_index(['year', 'item'], inplace = True)

This is a smaller example of the dataframe I'm working with. I want to return a dataframe that keeps the exact same format, but only has the top two values from each year.

So the final dataframe should output metals and water from 2012 with their value, and apples and water from 2013 with their respective value, and in the same format as the original dataframe.

score 4 · Accepted Answer · answered Aug 22 '18 at 18:27

4

IIUC sort_values before groupby tail

df.sort_values('value').groupby(level=0).tail(2).sort_index()
Out[258]: 
             value
year item         
2012 metals     14
     water      16
2013 apples     17
     water      14

answered Aug 22 '18 at 18:27

BENY

317,841
20
164
234

Sadly, the order is not keep in each subgroup – el Josso Apr 12 '22 at 11:20

score 2 · Answer 2 · answered Aug 22 '18 at 18:30

2

Applying nlargest

pd.concat([d.nlargest(2, columns=['value']) for _, d in df.groupby('year')])

             value
year item         
2012 water      16
     metals     14
2013 apples     17
     water      14

answered Aug 22 '18 at 18:30

piRSquared

285,575
57
475
624

Pandas Return top N from each group in MultiIndex

2 Answers2