I have been trying to qcut an array of values into 4 bins. I am getting the error below? How to solve this I am a beginner in Python

Question

Below is my array data: wkx_old['Sales point'].values

array([ 2, 2, 2, 4, 4, 3, 1, 4, 2, 1, 3, 4, 1, 1, 4, 7, 4, 1, 1, 2, 4, 3, 4, 3, 3, 2, 5, 2, 3, 2, 3, 4, 2, 10, 4, 4, 6, 3, 3, 1, 1, 2, 1, 3, 2, 4, 5, 2, 4, 3, 2, 3, 4, 3, 1, 1, 6, 3, 6, 5, 7, 2, 1, 1, 6, 5, 1, 1, 1, 2, 2, 1, 2, 2, 4, 4, 1, 5, 7, 2, 1, 2, 1, 5, 3, 1, 1, 2, 3, 3, 5, 4, 4, 6, 1, 4, 4, 1, 3, 4, 4, 5, 4, 4, 1, 1, 3, 1, 2, 1, 3, 7, 2, 1, 1, 3, 3, 6, 1, 6, 2, 3, 7, 1])

Trying to compute below code:

names=['D','C','B','A']

wkx_old['Rankings'] = pd.qcut(wkx_old['Sales point'],q=4,labels=names)

The error I am getting: ValueError: Bin edges must be unique: array([ 1., 1., 3., 4., 10.]). You can drop duplicate edges by setting the 'duplicates' kwarg

Check this out, I guess there is your answer: https://stackoverflow.com/questions/20158597/how-to-qcut-with-non-unique-bin-edges/40548606#40548606 — divingTobi, Apr 06 '21 at 14:20
Furthermore your `names` list is too short. With `q=4` you cut into 5 segments (`q` being the number of cuts). Therefore names should be a list of 5 elements. — divingTobi, Apr 06 '21 at 14:23

score 2 · Accepted Answer · answered Apr 06 '21 at 14:14

2

qcut is not friendly with duplicated data and will throw an error when it sees a duplicate at splitting point. Imagine you do a qcut on [1]*100, what is the 50-th percentile?

You can try rank(pct=True) to calculate the actual percentile for the value, then cut:

wkx_old['Rankings'] = pd.cut(wkx_old['Sales point'].rank(pct=True), 
                             bins=4, labels=names)

Output:

0      C
1      C
2      C
3      B
4      B
      ..
119    A
120    C
121    C
122    A
123    D
Length: 124, dtype: category
Categories (4, object): ['D' < 'C' < 'B' < 'A']

answered Apr 06 '21 at 14:14

Quang Hoang

146,074
10
56
74

thanks, but after making these changes I am still getting the below error- ValueError: Bin edges must be unique: array([0.13306452, 0.13306452, 0.5483871 , 0.73790323, 1. ]). You can drop duplicate edges by setting the 'duplicates' kwarg – Adarsh Rai Apr 06 '21 at 14:21
Thanks this really helped a lot. – Adarsh Rai Apr 06 '21 at 14:29

score 0 · Answer 2 · answered Apr 06 '21 at 14:29

There are two problems with your code:

qcut tries to size the windows such that the number of elements are approximately the same for each window. As there are a lot of 1s in your data, it will try to create this window: array([ 1., 1., 3., 4., 10.]), as per the error message. The first two entries are identical, which then leads to the error that you see. To fix this add the parameter duplicates='drop' to qcut:

pd.qcut(wkx_old['Sales point'], q=4, duplicates='drop')

the second problem is that your names list is 4 elements long, but you are cutting the data into 5 windows (q=4 is the number of cuts). To fix this just add another element to the list:

names = ['E', 'D', 'C', 'B', 'A']
pd.qcut(wkx_old['Sales point'], q=4, duplicates='drop', labels=names)

This should then work.

I have been trying to qcut an array of values into 4 bins. I am getting the error below? How to solve this I am a beginner in Python

2 Answers2