1

My data is here.

q = pd.qcut(df['loss_percent'], 10)

ValueError: Bin edges must be unique: array([ 0.38461538,  0.38461538,  0.46153846,  0.46153846,  0.53846154,
        0.53846154,  0.53846154,  0.61538462,  0.69230769,  0.76923077,  1.        ])

I have read through why-use-pandas-qcut-return-valueerror, however I am still confused.

I imagine that one of my values has a high frequency of occurrence and that is breaking qcut.

First, step is how do I determine if that is indeed the case, and which value is the problem. Lastly, what kind of solution is appropriate given my data.

Community
  • 1
  • 1
codingknob
  • 11,108
  • 25
  • 89
  • 126
  • I answered this question [here](http://stackoverflow.com/a/36883735/2336654) – piRSquared Jan 05 '17 at 00:57
  • Possible duplicate of [pd.qcut with values that are inf (infinity) ValueError: Bin edges must be unique:](http://stackoverflow.com/questions/41475470/pd-qcut-with-values-that-are-inf-infinity-valueerror-bin-edges-must-be-unique) – Julien Marrec Jan 05 '17 at 09:51

1 Answers1

4

Using the solution in the post https://stackoverflow.com/a/36883735/2336654

def pct_rank_qcut(series, n):
    edges = pd.Series([float(i) / n for i in range(n + 1)])
    f = lambda x: (edges >= x).argmax()
    return series.rank(pct=1).apply(f)

q = pct_rank_qcut(df.loss_percent, 10)
Community
  • 1
  • 1
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • So this solution still cuts the data into equal blocks (my case decile?). If that is the case then why isn't qcut fixed to do exactly as per your solution? Just wondering... – codingknob Jan 05 '17 at 02:40