0

I'm trying to convert dataset values into percentiles and I created a function for this problem but it doesn't seem to work, however, when I run the same code used within the function just by itself, it works. Would someone please be able to help me figure out why I can't run the code within the function? Thank you so much for your help.

I have the following dataset:

    A   B   C   D
0  31  78  10  35
1  73  78   6  69
2  59  24  26   0
3  87  55  13  41
4  13   9  32  97
5  32  93  71  52
6  35  72  63  10
7  30  40  29  30
8  85  85  31   2

And I wanted to get percentiles for each value with the following function:

import pandas as pd
data = pd.read_csv('datafile.csv')

def percentile_convert(x):
    x['A_Prcnt'] = pd.qcut(x.A, 100, labels=False) / 100
    x['B_Prcnt'] = pd.qcut(x.B, 100, labels=False) / 100
    x['C_Prcnt'] = pd.qcut(x.C, 100, labels=False) / 100
    x['D_Prcnt'] = pd.qcut(x.D, 100, labels=False) / 100
    x = x[['A_Prcnt', 'B_Prcnt', 'C_Prcnt', 'D_Prcnt']]
    return x

data = data.apply(percentile_convert, axis=1)

Once I run this, I get the following error:

ValueError: ("Bin edges must be unique: array([31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,\n       31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,\n       31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,\n       31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,\n       31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,\n       31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31]).\nYou can drop duplicate edges by setting the 'duplicates' kwarg", 'occurred at index 0')

But if I run the same code outside of a function, like this:

data['A_Prcnt'] = pd.qcut(data.A, 100, labels=False, duplicates='drop') / 100
data['B_Prcnt'] = pd.qcut(data.B, 100, labels=False, duplicates='drop') / 100
data['C_Prcnt'] = pd.qcut(data.C, 100, labels=False, duplicates='drop') / 100
data['D_Prcnt'] = pd.qcut(data.D, 100, labels=False, duplicates='drop') / 100

data = data[['A_Prcnt', 'B_Prcnt', 'C_Prcnt', 'D_Prcnt']]
print(data)

I get back the desired result, which is:

   A_Prcnt  B_Prcnt  C_Prcnt  D_Prcnt
0     0.24     0.62     0.12     0.49
1     0.74     0.62     0.00     0.87
2     0.62     0.12     0.37     0.00
3     0.99     0.37     0.24     0.62
4     0.00     0.00     0.74     0.99
5     0.37     0.87     0.99     0.74
6     0.49     0.49     0.87     0.24
7     0.12     0.24     0.49     0.37
8     0.87     0.75     0.62     0.12
martineau
  • 119,623
  • 25
  • 170
  • 301
michael0196
  • 1,497
  • 1
  • 9
  • 21

1 Answers1

0

Well you forgot the

drop_duplicates = True

In invocation parameters. The code is different in both cases