0

I have the following pandas dataframe

df = pd.DataFrame([[99, 88, 88, 66, 66, 99, 66, 77, 77, 77, 66, 66, 99, 99], list('DAABBBBABCBDDD'), ['***','**','****','*','***','*','**','***','*','*','****','**','**','****']]).T
df.columns = ['col1','col2','col3']

Assume that col1 are companies and col2 are products types. I am looking for the company with the largest number of different product types.

So I am looking for which unique value from col1 has the largest number of unique values in col2

I have tried the following:

df.groupby(['col1'])['col2'].nunique()

which returns:

col1
66    2
77    3
88    1
99    2

Now I would like to get the value from col1 with the highest value in col2. Which is:

77    3

I have tried

df.groupby(['col2'])['col1'].nunique().max()

However I only receive the max of unique values in col2

3

Instead, I would like to know both the max value from col2 and to which value in col1 this belongs. I.e.

 77    3

Thank you for your help!

mozway
  • 194,879
  • 13
  • 39
  • 75
Julian
  • 9
  • 3

2 Answers2

0

I would like to know both the max value from col2 and to which value in col1 this belongs.

With your result, call both:

result = df.groupby(['col1'])['col2'].nunique()
result.idxmax()  # 77
result.max()  # 3

You could also convert it to a DataFrame before calling .loc[lambda d: d.idxmax()] but I don't know why you would want to do that.

ifly6
  • 5,003
  • 2
  • 24
  • 47
  • better use `idx = result.idxmax() ; result[idx]` to avoid unnecessarily computing the max twice – mozway Feb 01 '22 at 12:04
0

Try this,

grouped=pd.DataFrame(df.groupby(['col1'])['col2'].nunique()).reset_index()
grouped[grouped['col2'] == grouped["col2"].max()]
Abishek VK
  • 506
  • 3
  • 12