2

I don't understand when and why this error is raised.

From my understanding, resample should create as many bins as needed in order to bin all the timestamps of the index. So the message "Values falls before first bin" does not make much sense to me.

Example/actual output:

>>> df = pd.DataFrame(index=pd.date_range(start='2021-04-22 01:00:00', end='2021-04-28 01:00', freq='1d'), data = [1]*7)
>>> df 
                     0
2021-04-22 01:00:00  1
2021-04-23 01:00:00  1
2021-04-24 01:00:00  1
2021-04-25 01:00:00  1
2021-04-26 01:00:00  1
2021-04-27 01:00:00  1
2021-04-28 01:00:00  1
>>> df.resample(rule='7d', origin='2021-04-29 00:00:00', closed='right', label='right').sum()
[...]
ValueError: Values falls before first bin

Expected output:

>>> df.resample(rule='7d', origin='2021-04-29 00:00:00', closed='right', label='right').sum() 
            0
2021-04-29  7 # bin (2021-04-22 00:00:00, 2021-04-29 00:00:00]

I'm using pandas 1.3.5

actual_panda
  • 1,178
  • 9
  • 27
  • Just to add information: possible related [issue on github](https://github.com/pandas-dev/pandas/issues/44957) – Ric S Dec 21 '21 at 13:50

2 Answers2

1

From this question I learned that the timestamps are likely truncated with respect to the unit given in the rule argument before they are sorted into the correct bin.

This means that

  1. 2021-04-22 01:00:00 is rounded to 2021-04-22 00:00:00
  2. 2021-04-22 00:00:00 does not fit into the bin (2021-04-22 00:00:00, 2021-04-29 00:00:00] which leads to the ValueError

To my eyes this looks like a bug or misfeature. At least one of "truncate timestamps before binning" or "don't add bins as needed, instead raise error" seems to be wrong.

actual_panda
  • 1,178
  • 9
  • 27
0

I found time = time.dt.normalize() to help

Hanan Shteingart
  • 8,480
  • 10
  • 53
  • 66