0

When I run the code below, I'm getting the error below after I try to reset the index the second time. Does anyone see what the issue might be. I have other similarly structured dataframes that it runs for without issue, with just different values. Is it a caching problem?

Data:

print df[1:3]

 stf_id THING thnid div_id prd preference_THING
1 123 turtle 223 3 0.1
2 223 bug 335 4 0.9
3 443 cat 221 5 0.6




Code:

# Constants
STFID = 'stf_id'
THING='THING'
THNID='thnid'
DIV_ID='div_id'
PRD='prd'
THING_PREF = 'preference_THING'

df2_STFID_tot_prod = df2.groupby(STFID)[PRD].sum()

# df2.reset_index(drop=True)
df2.reset_index(inplace=True)

df2.set_index(STFID, inplace=True)

df2['tot_products'] = df2_STFID_tot_prod
df2['rel_products'] = df2[PRD].div(df2['tot_products'])
df2['tot_products_gr'] = pd.qcut(df2_STFID_tot_prod, 25, range(25))

df2.reset_index(inplace=True)


Error:

ValueErrorTraceback (most recent call last)
<ipython-input-40-aeea358a4589> in <module>()
     28 df2['tot_products'] = df2_STFID_tot_prod
     29 df2['rel_products'] = df2[PRD].div(df2['tot_products'])
---> 30 df2['tot_products_gr'] = pd.qcut(df2_STFID_tot_prod, 25, range(25))
     31 
     32 df2.reset_index(inplace=True)

/data2/lk123/anaconda2/lib/python2.7/site-packages/pandas/core/reshape/tile.pyc in qcut(x, q, labels, retbins, precision, duplicates)
    206     fac, bins = _bins_to_cuts(x, bins, labels=labels,
    207                               precision=precision, include_lowest=True,
--> 208                               dtype=dtype, duplicates=duplicates)
    209 
    210     return _postprocess_for_cut(fac, bins, retbins, x_is_series,

/data2/lk123/anaconda2/lib/python2.7/site-packages/pandas/core/reshape/tile.pyc in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates)
    232             raise ValueError("Bin edges must be unique: {bins!r}.\nYou "
    233                              "can drop duplicate edges by setting "
--> 234                              "the 'duplicates' kwarg".format(bins=bins))
    235         else:
    236             bins = unique_bins

ValueError: Bin edges must be unique: array([ 1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,
        2,  2,  2,  2,  2,  3,  4,  4, 13]).
You can drop duplicate edges by setting the 'duplicates' kwarg
modLmakur
  • 531
  • 2
  • 8
  • 24
  • The error is being raised by `pd.qcut(df2_STFID_tot_prod, 25, range(25))`, not `df2.reset_index()`. [Luca's answer](https://stackoverflow.com/a/40548606/190597) on the linked page explains why it is happening and shows a number of ways to fix the issue. – unutbu Mar 15 '18 at 17:12
  • See also github issue [#7751](https://github.com/pandas-dev/pandas/issues/7751) and [#15069](https://github.com/pandas-dev/pandas/issues/15069). – unutbu Mar 15 '18 at 17:43

0 Answers0