1

When trying to get a value count for some columns in my data frame, I get this error saying the index must be monotonic, but the is_monotonic property says that the index is already that way. The majority of the columns in the dataframe after importing a csv don't return this error, but a few do.

I've tried some of the tactics mentioned here, but can't seem to get it working.

Doing this:

import pandas as pd
data = pd.read_csv('info/train.csv')
print('Monotonic?: ', data['net_booking_value_monthly'].index.is_monotonic)
print(data['net_booking_value_monthly'].value_counts(dropna=False)[:10])

Gives me this:

Monotonic?:  True

Traceback (most recent call last):
  File "/Users/person/venvs/science/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3484, in get_slice_bound
    return self._searchsorted_monotonic(label, side)
  File "/Users/person/venvs/science/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3443, in _searchsorted_monotonic
    raise ValueError('index must be monotonic increasing or decreasing')
ValueError: index must be monotonic increasing or decreasing

During handling of the above exception, another exception occurred:

etc. etc.

It's doing my head in that the is_monotonic property is True, but the value count returns this error. The input CSV file is pretty big and I can't share it, but is there anything I should look for in there that would cause this?

Pandas version is 0.20.2.

matt
  • 58
  • 1
  • 7
  • 1
    Try this? `data = data.reset_index(drop=True)` – cs95 Dec 13 '17 at 09:58
  • Cheers. Tried it. No dice unfortunately. Exact same error. – matt Dec 13 '17 at 10:07
  • Do you mind providing a data sample that accurately reproduces your problem? You could also try updating to the latest stable release - 0.21. – cs95 Dec 13 '17 at 10:08
  • OK so this csv data produces the error: [link](https://justpaste.it/1eio1) If I enter this: `print('Vegetables are Monotonic?: ', data['vegetables'].index.is_monotonic) print(data['vegetables'].value_counts()) print('Values are Monotonic?: ', data['value'].index.is_monotonic) print(data['value'].value_counts()[) ` I get this: `Vegetables are Monotonic?: True 4.0 48 etc. Name: vegetables, dtype: int64 ValueError: index must be monotonic ` Weirdly, if I shorten the CSV so it is 99 lines, I get an error on first column (vegetable) – matt Dec 13 '17 at 11:00
  • I get no such errors. Did you upgrade? `pip install --upgrade pandas` – cs95 Dec 13 '17 at 11:03
  • Yup I'm now on 0.21.1. I'm bewildered. Might the full stack trace help? – matt Dec 13 '17 at 11:05
  • 1
    I don't think so. This might be a bug. Sorry, I'm in over my head here. :-) – cs95 Dec 13 '17 at 11:06
  • Thanks for giving it a shot. I tried changing the line endings on the csv file, and I tried saving with and without a blank line at the bottom. Some versions of the file lead to the error depending on the number of rows, but I can't isolate exactly what's causing it. Time to bust out Excel I guess. – matt Dec 13 '17 at 11:16
  • Instead of `data['net_booking_value_monthly'].value_counts(dropna=False)[:10]` try `data['net_booking_value_monthly'].value_counts(dropna=False).iloc[:10]`. I think your slicing might be causing the error. – Alex Riley Dec 13 '17 at 11:25
  • Yup that did it! Thanks a million. Is there any link you can recommend for some reading on the difference between the slicing I did and the version you used? Also any insight in to why is_monotonic returns True despite the error would be nice. – matt Dec 13 '17 at 11:38
  • `data['net_booking_value_monthly']` and `data['net_booking_value_monthly'].value_counts(dropna=False)‌​` will have different indexes, the latter not necessarily monotonic. I'm not sure exactly what's causing the error though. If `v = data['net_booking_value_monthly'].value_counts(dropna=False)‌​` what is the output of `len(v)` and `v.index.dtype`? – Alex Riley Dec 13 '17 at 12:51
  • It is `4056` & `float64` respectively. `v.is_monotonic` gives `False`. Everything seems a lot clearer now. Thanks. – matt Dec 15 '17 at 13:59

2 Answers2

2

Because of value_counts() returning Unique Values as INDEX.

The Problem lies solely because the value_counts() method returns Unique Values as Index and their corresponding Frequencies as Values. So If you try to index the Top 10 values using the Indexing method [:10] , it won't work as indexes are not the Integers created by Pandas.

Solution:

You can use .reset_index() after value_counts() and then access the top 10 values like you were

data['net_booking_value_monthly'].value_counts(dropna=False).reset_index()[:10]

0

Recommend trying this:

import pandas as pd
data = pd.read_csv('info/train.csv')
print('Monotonic?: ', data['net_booking_value_monthly'].index.is_monotonic)
data = data.sort_index()
print(data['net_booking_value_monthly'].value_counts(dropna=False)[:10])
Sergio G
  • 31
  • 3