pd.qcut - ValueError: Bin edges must be unique

Question

My data is here.

q = pd.qcut(df['loss_percent'], 10)

ValueError: Bin edges must be unique: array([ 0.38461538,  0.38461538,  0.46153846,  0.46153846,  0.53846154,
        0.53846154,  0.53846154,  0.61538462,  0.69230769,  0.76923077,  1.        ])

I have read through why-use-pandas-qcut-return-valueerror, however I am still confused.

I imagine that one of my values has a high frequency of occurrence and that is breaking qcut.

First, step is how do I determine if that is indeed the case, and which value is the problem. Lastly, what kind of solution is appropriate given my data.

I answered this question [here](http://stackoverflow.com/a/36883735/2336654) — piRSquared, Jan 05 '17 at 00:57
Possible duplicate of [pd.qcut with values that are inf (infinity) ValueError: Bin edges must be unique:](http://stackoverflow.com/questions/41475470/pd-qcut-with-values-that-are-inf-infinity-valueerror-bin-edges-must-be-unique) — Julien Marrec, Jan 05 '17 at 09:51

score 4 · Accepted Answer · edited May 23 '17 at 12:16

4

Using the solution in the post https://stackoverflow.com/a/36883735/2336654

def pct_rank_qcut(series, n):
    edges = pd.Series([float(i) / n for i in range(n + 1)])
    f = lambda x: (edges >= x).argmax()
    return series.rank(pct=1).apply(f)

q = pct_rank_qcut(df.loss_percent, 10)

edited May 23 '17 at 12:16

Community

1
1

answered Jan 05 '17 at 01:00

piRSquared

285,575
57
475
624

So this solution still cuts the data into equal blocks (my case decile?). If that is the case then why isn't qcut fixed to do exactly as per your solution? Just wondering... – codingknob Jan 05 '17 at 02:40

pd.qcut - ValueError: Bin edges must be unique

1 Answers1

Linked