7

I have a pandas dataframe with following shape

 open_year, open_month, type, col1, col2, ....

I'd like to find the top type in each (year,month) so I first find the count of each type in each (year,month)

freq_df = df.groupby(['open_year','open_month','type']).size().reset_index()
freq_df.columns = ['open_year','open_month','type','count']

Then I want to find the top n type based on their freq (e.g. count) for each (year_month). How can I do that?

I can use nlargest but I'm missing the type

freq_df.groupby(['open_year','open_month'])['count'].nlargest(5)

but I'm missing the column type

HHH
  • 6,085
  • 20
  • 92
  • 164
  • Please read [mcve] and edit your question accordingly. It will make it more useful to the community and easier to answer. – piRSquared May 18 '18 at 16:28
  • it complaining about ''Cannot access callable attribute 'nlargest' of 'DataFrameGroupBy' objects, try using the 'apply' method – HHH May 18 '18 at 16:33

1 Answers1

8

I'd recommend sorting your counts in descending order first, and you can call GroupBy.head after—

(freq_df.sort_values('count', ascending=False)
        .groupby(['open_year','open_month'], sort=False).head(5)
)
cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    it then return only 5 rows, but I need to get 5 rows per (year,month) – HHH May 18 '18 at 16:41
  • @H.Z. The `.groupby(['open_year','open_month'], sort=False).head(5)` should take care of that. – cs95 May 18 '18 at 16:42