1

I have a dataframe of numerical and categorical columns which I am trying to group by certain columns and aggregate.

I am trying to apply mode function on categorical columns in a pandas dataframe and other statistical functions like sum,min..etc on other columns in the same dataframe.

I am not able to get the mode for certain column.

What I tried so far is:

df_agg = df.sort_values(by=['userId', 'num1','num2'],ascending=False).groupby(['userId', 'cat1', 'cat2']).agg({'num3': 'sum', 'num2': 'sum','num1': 'sum', 'num4': 'sum', 'num5':'sum', 'num6':'sum', 'num7': 'min', 'cat3':'mode', 'cat4':'mode'}).reset_index()

That gives the error: 'SeriesGroupBy' object has no attribute 'mode'.

How can I get mode of these categorical columns in this case?

R.A
  • 97
  • 1
  • 12

1 Answers1

1

Use lambda function, because mode should return multiple first values is selected first value:

..., 'cat4':lambda x: x.mode().iat[0]}

EDIT: If need all modes:

..., 'cat4':lambda x: list(x.mode())).reset_index().explode('cat4')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • iat[0] gives me the first value if multiple modes exist? and what if I want one row per mode ? – R.A Feb 09 '23 at 08:19
  • 1
    @R.A - `iat[0] gives me the first value if multiple modes exist` - yes, exactly. `and what if I want one row per mode` - try `'cat4':lambda x: list(x.mode())` and after solution `df = df.explode('mode')` – jezrael Feb 09 '23 at 08:22
  • Thanks so much. I am new to Python and dataframe manipulation. Do you recommend any courses or resources to data cleaning and manipulation for machine learning to enhance my experience? – R.A Feb 09 '23 at 08:24
  • 1
    @R.A - You can check [this](https://pandas.pydata.org/docs/getting_started/tutorials.html) – jezrael Feb 09 '23 at 08:26