I have a dataframe that has two columns col_1
and col_2
. Values in column col_2
corresponds to values in column col_1
.
print (df)
col_1 col_2
1 a 12
2 a 33
3 a 11
4 a 4
5 a 42
6 a 66
7 a 9
8 b 12
9 b 34
10 b 42
11 b 64
12 b 86
13 b 2
What i am trying to do is that for each value in col_1
(a, b, c,..)
, i want to sort the corresponding values in col_2
and select ONLY the top 5 values. The new dataframe is expected to be like this:
I tried the of dropping duplicates applied here since col_2
can sometimes have duplicates. But, it didn't work.
df.sort_values('col_2', ascending=False).drop_duplicates('col_a').sort_index()
Any suggestions will be appreciated