Need uniform tied rankings for a column in pandas

Question

I'm fairly new to Pandas. I have a DataFrame like below:

In [47]: print(d1)
         date name            sector     value
0  2014-10-31    A  Information Tech  -3.18229
1  2014-10-31    B       Industrials  -52.1333
2  2014-10-31    C  Consumer Discret   45.3428
3  2014-10-31    D       Industrials   -4.4901
4  2014-10-31    E       Industrials   6.85653
5  2014-10-31    F  Information Tech   4.56422
6  2014-10-31    H  Information Tech  29.31419
7  2014-10-31    G  Information Tech   6.52422
8  2014-10-31    I            Sports  16.52422
9  2014-10-31    J            Sports   2.62176

Use-case is, for a particular day of records, I need uniform ranking for the value column. Rank should always lie between 1 to 6 irrespective of the number of records. In case of a tie, I need to give an average of the rank for those records.

For simplicity, I've kept the total_rows as just 10 above, in reality, it is a much higher number for every date(in range of thousands). And the range of ranks would be from 1 to 100

Output can be something like below:

         date name            sector     value  rank
0  2014-10-31    A  Information Tech  -3.18229  5
1  2014-10-31    B       Industrials  -52.1333  6
2  2014-10-31    C  Consumer Discret   45.3428  1
3  2014-10-31    D       Industrials   -4.4901  5
4  2014-10-31    E       Industrials   6.85653  3
5  2014-10-31    F  Information Tech   4.56422  3
6  2014-10-31    H  Information Tech  29.31419  1
7  2014-10-31    G  Information Tech   6.52422  3
8  2014-10-31    I            Sports  16.52422  2
9  2014-10-31    J            Sports   2.62176  4

What is the best way to provide uniform ranks? I tried to search a lot, could not find anything useful.

is rank going to be random number or some function of `value`? — jkhadka, Oct 09 '18 at 13:04
Rank should always fall between 1 to 6 in case of the DF I shared above. — Mayank Porwal, Oct 09 '18 at 13:09

Scott Boston · Answer 1 · 2018-10-09T13:16:58.047

1

You can try using pd.qcut:

df['rank'] = pd.qcut(df['value'], 6, [*'654321'])

Or

df['rank'] = pd.qcut(df['value'], 6, labels = ['6','5','4','3','2','1'])

Output:

             date name            sector     value rank
0 2014-10-31         A  Information Tech  -3.18229    5
1 2014-10-31         B       Industrials -52.13330    6
2 2014-10-31         C  Consumer Discret  45.34280    1
3 2014-10-31         D       Industrials  -4.49010    6
4 2014-10-31         E       Industrials   6.85653    3
5 2014-10-31         F  Information Tech   4.56422    4
6 2014-10-31         H  Information Tech  29.31419    1
7 2014-10-31         G  Information Tech   6.52422    3
8 2014-10-31         I            Sports  16.52422    2
9 2014-10-31         J            Sports   2.62176    5

edited Oct 09 '18 at 13:16

answered Oct 09 '18 at 13:04

Scott Boston

147,308
15
139
187

Your command is failing due to Syntax error at '*'. So, I removed it. Then it fails saying, `Bin labels must be one fewer than the number of bin edges` – Mayank Porwal Oct 09 '18 at 13:13
I am using unpacking here, this may be a version issue, replace [*'654321'] with ['6','5','4',3','2','1']. `pd.qcut(df['value'], 6, labels = ['6','5','4','3','2','1'])` – Scott Boston Oct 09 '18 at 13:15
yes, looked like a difference in the version. I am using pandas 0.23.4 – Mayank Porwal Oct 09 '18 at 13:41
What version of python are you using? – Scott Boston Oct 09 '18 at 13:46
I am using Python 3.5.2 – Mayank Porwal Oct 09 '18 at 14:02
Strange. The first syntax should work. Nonetheless, does the second syntax work? – Scott Boston Oct 09 '18 at 14:02
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/181590/discussion-between-mayank-porwal-and-scott-boston). – Mayank Porwal Oct 10 '18 at 06:07
I digged in a bit deeper. So, `qcut` fails if there are duplicate values. I took help from @luca 's answer in the following post: https://stackoverflow.com/questions/20158597/how-to-qcut-with-non-unique-bin-edges Now, everything works. Thanks. – Mayank Porwal Oct 10 '18 at 15:37

Mayank Porwal · Accepted Answer · 2018-11-16T11:25:58.207

0

So, I was able to find the apt solution for my use-case. I use cut functions with pandas rank function.

df['rank'] = pd.cut(df['value'], 100, labels = list(range(1,101)))

edited Nov 16 '18 at 11:25

answered Oct 15 '18 at 10:08

Mayank Porwal

33,470
8
37
58

Need uniform tied rankings for a column in pandas

2 Answers2