I'm trying to convert dataset values into percentiles and I created a function for this problem but it doesn't seem to work, however, when I run the same code used within the function just by itself, it works. Would someone please be able to help me figure out why I can't run the code within the function? Thank you so much for your help.
I have the following dataset:
A B C D
0 31 78 10 35
1 73 78 6 69
2 59 24 26 0
3 87 55 13 41
4 13 9 32 97
5 32 93 71 52
6 35 72 63 10
7 30 40 29 30
8 85 85 31 2
And I wanted to get percentiles for each value with the following function:
import pandas as pd
data = pd.read_csv('datafile.csv')
def percentile_convert(x):
x['A_Prcnt'] = pd.qcut(x.A, 100, labels=False) / 100
x['B_Prcnt'] = pd.qcut(x.B, 100, labels=False) / 100
x['C_Prcnt'] = pd.qcut(x.C, 100, labels=False) / 100
x['D_Prcnt'] = pd.qcut(x.D, 100, labels=False) / 100
x = x[['A_Prcnt', 'B_Prcnt', 'C_Prcnt', 'D_Prcnt']]
return x
data = data.apply(percentile_convert, axis=1)
Once I run this, I get the following error:
ValueError: ("Bin edges must be unique: array([31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,\n 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,\n 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,\n 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,\n 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,\n 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31]).\nYou can drop duplicate edges by setting the 'duplicates' kwarg", 'occurred at index 0')
But if I run the same code outside of a function, like this:
data['A_Prcnt'] = pd.qcut(data.A, 100, labels=False, duplicates='drop') / 100
data['B_Prcnt'] = pd.qcut(data.B, 100, labels=False, duplicates='drop') / 100
data['C_Prcnt'] = pd.qcut(data.C, 100, labels=False, duplicates='drop') / 100
data['D_Prcnt'] = pd.qcut(data.D, 100, labels=False, duplicates='drop') / 100
data = data[['A_Prcnt', 'B_Prcnt', 'C_Prcnt', 'D_Prcnt']]
print(data)
I get back the desired result, which is:
A_Prcnt B_Prcnt C_Prcnt D_Prcnt
0 0.24 0.62 0.12 0.49
1 0.74 0.62 0.00 0.87
2 0.62 0.12 0.37 0.00
3 0.99 0.37 0.24 0.62
4 0.00 0.00 0.74 0.99
5 0.37 0.87 0.99 0.74
6 0.49 0.49 0.87 0.24
7 0.12 0.24 0.49 0.37
8 0.87 0.75 0.62 0.12