Pandas Dataframe: Find unique value from one column which has the largest number of unique values in another column

Question

I have the following pandas dataframe

df = pd.DataFrame([[99, 88, 88, 66, 66, 99, 66, 77, 77, 77, 66, 66, 99, 99], list('DAABBBBABCBDDD'), ['***','**','****','*','***','*','**','***','*','*','****','**','**','****']]).T
df.columns = ['col1','col2','col3']

Assume that col1 are companies and col2 are products types. I am looking for the company with the largest number of different product types.

So I am looking for which unique value from col1 has the largest number of unique values in col2

I have tried the following:

df.groupby(['col1'])['col2'].nunique()

which returns:

Now I would like to get the value from col1 with the highest value in col2. Which is:

77    3

I have tried

df.groupby(['col2'])['col1'].nunique().max()

However I only receive the max of unique values in col2

Instead, I would like to know both the max value from col2 and to which value in col1 this belongs. I.e.

 77    3

Thank you for your help!

score 0 · Accepted Answer · answered Feb 01 '22 at 12:00

0

I would like to know both the max value from col2 and to which value in col1 this belongs.

With your result, call both:

result = df.groupby(['col1'])['col2'].nunique()
result.idxmax()  # 77
result.max()  # 3

You could also convert it to a DataFrame before calling .loc[lambda d: d.idxmax()] but I don't know why you would want to do that.

answered Feb 01 '22 at 12:00

ifly6

5,003
2
24
47

better use `idx = result.idxmax() ; result[idx]` to avoid unnecessarily computing the max twice – mozway Feb 01 '22 at 12:04

score 0 · Answer 2 · answered Feb 01 '22 at 12:14

0

Try this,

grouped=pd.DataFrame(df.groupby(['col1'])['col2'].nunique()).reset_index()
grouped[grouped['col2'] == grouped["col2"].max()]

answered Feb 01 '22 at 12:14

Abishek VK

506
3
12

Pandas Dataframe: Find unique value from one column which has the largest number of unique values in another column

2 Answers2