7

I am trying to split time series data into labelled segments like this:

import pandas as pd
import numpy as np

# Create example DataFrame of stock values
df = pd.DataFrame({
    'ticker':np.repeat( ['aapl','goog','yhoo','msft'], 25 ),
    'date':np.tile( pd.date_range('1/1/2011', periods=25, freq='D'), 4 ),
    'price':(np.random.randn(100).cumsum() + 10) })

# Cut the date into sections 
today = df['date'].max()
bin_edges = [pd.Timestamp.min, today - pd.Timedelta('14 days'), today - pd.Timedelta('7 days'), pd.Timestamp.max]
df['Time Group'] = pd.cut(df['date'], bins=bin_edges, labels=['history', 'previous week', 'this week'])

But I am getting an error even though bin_edges does seem to be increasing monotonically..

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-42-00524c0a883b>", line 13, in <module>
    df['Time Group'] = pd.cut(df['date'], bins=bin_edges, labels=['history', 'previous week', 'this week'])
  File "C:\Anaconda3\lib\site-packages\pandas\core\reshape\tile.py", line 228, in cut
    raise ValueError('bins must increase monotonically.')
ValueError: bins must increase monotonically.


In[43]: bin_edges
Out[43]: 
[Timestamp('1677-09-21 00:12:43.145225'),
 Timestamp('2011-01-11 00:00:00'),
 Timestamp('2011-01-18 00:00:00'),
 Timestamp('2262-04-11 23:47:16.854775807')]

Why is this happening?

pyjamas
  • 4,608
  • 5
  • 38
  • 70

2 Answers2

4

This is a bug in pandas. Your edges need to be converted to numeric values in order to perform the cut, and by using pd.Timestamp.min and pd.Timestamp.max you're essentially setting the edges at the lower/upper bounds of what can be represented by 64bit integers. This is causing an overflow when trying to compare the edges for monotonicity, which is making it look like it's not monotonic increasing.

Demonstration of the overflow:

In [2]: bin_edges_numeric = [t.value for t in bin_edges]

In [3]: bin_edges_numeric
Out[3]:
[-9223372036854775000,
 1294704000000000000,
 1295308800000000000,
 9223372036854775807]

In [4]: np.diff(bin_edges_numeric)
Out[4]:
array([-7928668036854776616,      604800000000000,  7928063236854775807],
      dtype=int64)

Until this is fixed, my recommendation is to use a lower/upper that's closer to your actual dates but still achieves the same end result:

first = df['date'].min()
today = df['date'].max()
bin_edges = [first - pd.Timedelta('1000 days'), today - pd.Timedelta('14 days'),
             today - pd.Timedelta('7 days'), today + pd.Timedelta('1000 days')]

I picked 1000 days arbitrarily, and you could choose a different value as you see fit. With these modifications the cut should be not raise an error.

root
  • 32,715
  • 6
  • 74
  • 87
  • Thanks for the explanation. Do you know if there's an open issue on this bug? I couldn't find it. – pyjamas Apr 10 '19 at 19:02
  • 2
    I don't think so - I couldn't find one after a quick search and don't recall this coming up in the past. You can create an issue if you want, otherwise I can do it later today. I _think_ I know the fix to this so should be able to get it out in the next release. – root Apr 10 '19 at 19:14
  • 1
    Hi @root, i guess the bug here has not resolved yet. I still got same error as `np.diff` encounters negative value. I am trying to bin dataframe based on decreasing bin values, so the difference is naturally negative. – 7bStan Oct 14 '19 at 07:47
0

I was also getting the same error but none of the answers on stackoverflow helped my case. Posting here for benefit of others who landup here searching for an answer.

bins take the values in ascending order, in my case I had in descending order and got the same error "ValueError: bins must increase monotonically".

Resolved it by changing the order to ascending.

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – codetiger May 09 '22 at 09:24