How to select top n row from each group after group by in pandas?

Question

I have a pandas dataframe with following shape

 open_year, open_month, type, col1, col2, ....

I'd like to find the top type in each (year,month) so I first find the count of each type in each (year,month)

freq_df = df.groupby(['open_year','open_month','type']).size().reset_index()
freq_df.columns = ['open_year','open_month','type','count']

Then I want to find the top n type based on their freq (e.g. count) for each (year_month). How can I do that?

I can use nlargest but I'm missing the type

freq_df.groupby(['open_year','open_month'])['count'].nlargest(5)

but I'm missing the column type

Please read [mcve] and edit your question accordingly. It will make it more useful to the community and easier to answer. — piRSquared, May 18 '18 at 16:28
it complaining about ''Cannot access callable attribute 'nlargest' of 'DataFrameGroupBy' objects, try using the 'apply' method — HHH, May 18 '18 at 16:33

score 8 · Accepted Answer · answered May 18 '18 at 16:39

8

I'd recommend sorting your counts in descending order first, and you can call GroupBy.head after—

(freq_df.sort_values('count', ascending=False)
        .groupby(['open_year','open_month'], sort=False).head(5)
)

answered May 18 '18 at 16:39

cs95

1

it then return only 5 rows, but I need to get 5 rows per (year,month) – HHH May 18 '18 at 16:41
@H.Z. The `.groupby(['open_year','open_month'], sort=False).head(5)` should take care of that. – cs95 May 18 '18 at 16:42

1 Answers1