Inconsistent behavior of pandas DatetimeIndex.round?

Question

I encountered a strange, very unexpected behavior in the round-method of pandas.DatetimeIndex:

import pandas as pd
import datetime as dt
t1 = pd.DatetimeIndex([dt.datetime(2013,12,5,1,30,0),
                       dt.datetime(2013,12,5,2,30,0),
                       dt.datetime(2013,12,5,3,30,0),
                       dt.datetime(2013,12,5,4,30,0)])  
print(t1)

gives:

DatetimeIndex(['2013-12-05 01:30:00', '2013-12-05 02:30:00',
               '2013-12-05 03:30:00', '2013-12-05 04:30:00'],
              dtype='datetime64[ns]', freq=None)

So far, so good. Now I want to round to the nearest full hour. I don't mind if the next or the previous hour is chosen. But I need consistent behavior.

t2 = t1.round('H')
print(t2)

Surprisingly I get:

DatetimeIndex(['2013-12-05 02:00:00', '2013-12-05 02:00:00',
               '2013-12-05 04:00:00', '2013-12-05 04:00:00'],
              dtype='datetime64[ns]', freq=None)

Entries 1 and 3 got rounded up while entries 2 and 4 got rounded down. Is this supposed behavior? I guess there is some numerical stuff going on under the hood. But this is really disturbing. In my case the temporal resolution is constrained to minutes. So I can add (or subtract) 1s to every time and get the desired result. But this can't be the right way to do it.

ALollz · Accepted Answer · 2020-06-12T16:13:59.970

Many people learn the "round half up" rule such that 1.5 is rounded to 2, 2.5 is rounded to 3, etc. This is not how rounding is handled in numpy. From numpy.around, emphasis my own.

For values exactly halfway between rounded decimal values, NumPy rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0, -0.5 and 0.5 round to 0.0, etc.

Thinking about your times as hour fractions, this would be the expected behavior:

np.around([1.5, 2.5, 3.5, 4.5])
#array([2., 2., 4., 4.])

(pandas defines the same behaviour, using RoundTo.NEAREST_HALF_EVEN for rounding)

So how do you Round Half up for a Datetime with frequencies?

Buried deep is a RoundTo method and the rounding we want is RoundTo.NEAREST_HALF_PLUS_INFTY. We need to deal with the complication of datetimes, but again pandas already handles that; also import the round_nsint64 method.

from pandas._libs.tslibs.timestamps import RoundTo, round_nsint64

# rounded int64s 
rounded = round_nsint64(t1.view('i8'), RoundTo.NEAREST_HALF_PLUS_INFTY, 'H')

# Convert back to datetime
pd.DatetimeIndex(rounded)
#DatetimeIndex(['2013-12-05 02:00:00', '2013-12-05 03:00:00',
#               '2013-12-05 04:00:00', '2013-12-05 05:00:00'],
#              dtype='datetime64[ns]', freq=None)

Thanks. I only consulted the pandas' doc, not numpy's. Since I don't see a way around this, I stick with my hack, add 1s to every time and keep an eye on it. — Durtal, Jun 12 '20 at 15:56
@Durtal see the update, you can use pandas to do this properly if you're willing to import some rather specific methods buried deep in the library — ALollz, Jun 12 '20 at 16:06
@ALollz I might need to create a second account just so I can upvote this again. — It_is_Chris, Sep 02 '21 at 21:09

Inconsistent behavior of pandas DatetimeIndex.round?

1 Answers1