0

I want to groupby two columns ("Year", and "Size") and then get the count for the number of accidents associated with each size. I was able to do that by this line of code:

df.groupby(["Year", "Size"])["Accidents"].count()

the output is like:

enter image description here

My question is: how to sort one of the groupby columns ("Size") to be in a specific or custom order, such as (V_low, Low, Medium, and High)? I need also to drop the rows for only the "Size" column if they are nan

The result should be like that:

enter image description here

thanks for helping

Eng_GR
  • 57
  • 6
  • 1
    [Custom sorting in pandas dataframe](https://stackoverflow.com/q/13838405/15497888) You can make it categorical before grouping. Or `reindex` after. It kinda depends on what you're looking to do. – Henry Ecker Nov 07 '21 at 19:02
  • 2
    `df['Size'] = pd.Categorical(df['Size'], categories=['V_low', 'Low', 'Medium', 'High'], ordered=True)` before `groupby` (this establishes an custom ([Categorical](https://pandas.pydata.org/docs/reference/api/pandas.Categorical.html)) ordering which allows all sort operations to order in the column in the specified order) . – Henry Ecker Nov 07 '21 at 19:06
  • 2
    Or [`reindex`](https://pandas.pydata.org/docs/reference/api/pandas.Series.reindex.html) after`df.groupby(["Year", "Size"])["Accidents"].count().reindex(index=['V_low', 'Low', 'Medium', 'High'], level=1)` (this will only address ordering once, future sort operations will still be lexicographic.) – Henry Ecker Nov 07 '21 at 19:07
  • Thanks so much @HenryEcker. I really appreciate it. Your answer works perfect for me. – Eng_GR Nov 07 '21 at 19:23

0 Answers0