How to get the most frequent row in table

Question

How to get the most frequent row in a DataFrame? For example, if I have the following table:

   col_1  col_2 col_3
0      1      1     A
1      1      0     A
2      0      1     A
3      1      1     A
4      1      0     B
5      1      0     C

Expected result:

   col_1  col_2 col_3
0      1      1     A

EDIT: I need the most frequent row (as one unit) and not the most frequent column value that can be calculated with the mode() method.

score 12 · Answer 1 · answered Sep 28 '20 at 14:52

12

Check groupby

df.groupby(df.columns.tolist()).size().sort_values().tail(1).reset_index().drop(0,1)
   col_1  col_2 col_3  
0      1      1     A

answered Sep 28 '20 at 14:52

BENY

317,841
20
164
234

1

Alternative `df.groupby(df.columns.tolist(), as_index=False).size().sort_values('size').tail(1).drop('size', 1)` – Mykola Zotko Sep 28 '20 at 19:31

Divakar · Answer 2 · 2020-09-28T15:15:06.347

With NumPy's np.unique -

In [92]: u,idx,c = np.unique(df.values.astype(str), axis=0, return_index=True, return_counts=True)

In [99]: df.iloc[[idx[c.argmax()]]]
Out[99]: 
   col_1  col_2 col_3
0      1      1     A

If you are looking for performance, convert the string column to numeric and then use np.unique -

a = np.c_[df.col_1, df.col_2, pd.factorize(df.col_3)[0]]
u,idx,c = np.unique(a, axis=0, return_index=True, return_counts=True)

DDD1 · Answer 3 · 2020-09-28T19:10:52.850

4

You can do this with groupby and size:

df = df.groupby(df.columns.tolist(),as_index=False).size()
result = df.iloc[[df["size"].idxmax()]].drop(["size"], axis=1)
result.reset_index(drop=True) #this is just to reset the index

edited Sep 28 '20 at 19:10

answered Sep 28 '20 at 14:57

DDD1

361
1
11

You have to check your code. How do you get the `'size'` column? – Mykola Zotko Sep 28 '20 at 19:06
You are right, I added the "as_index=False" that I somehow omitted when writing it down. Thanks! – DDD1 Sep 28 '20 at 19:12

score 3 · Answer 4 · answered Sep 28 '20 at 17:08

npi_indexed library helps to perform some actions on 'groupby' type of problems with less script and similar performance as numpy. So this is alternative and pretty similar way to @Divakar's np.unique() based solution:

arr = df.values.astype(str)
idx = npi.multiplicity(arr)
output = df.iloc[[idx[c.argmax()]]]

Mykola Zotko · Accepted Answer · 2020-10-05T13:35:07.187

In Pandas 1.1.0. is possible to use the method value_counts() to count unique rows in DataFrame:

df.value_counts()

Output:

col_1  col_2  col_3
1      1      A        2
       0      C        1
              B        1
              A        1
0      1      A        1

This method can be used to find the most frequent row:

df.value_counts().head(1).index.to_frame(index=False)

Output:

   col_1  col_2 col_3
0      1      1     A

How to get the most frequent row in table

5 Answers5