I use pandas frequently and often execute code comparable to the following:
df['var_rank'] = df['var'].rank(pct=True)
print( df.var_rank.max() )
And will often get values greater than 1. It still happens whether I keep or drop 'na' values. This is obviously easy to fix (just divide by the value with the largest rank), so I'm not asking for a work-around. I'm just curious why this happens and haven't found any clues online.
Anyone know why this happens?
Some very simple example data here (dropbox link - pickled pandas series).
I get a value of 1.0156 from df.rank(pct=True).max()
. I've had other data with values as high as 4 or 5. I'm usually using pretty messy data.