1

I'm running into some odd behavior trying to group rows of a pandas dataframe by ID and then selecting out max/min datetimes (w/ timezones). This is with pandas 0.18.1 and numpy 1.11.1 (I saw in earlier posts a similar problem was apparently fixed w/ pandas 0.15).

Specifically, if I try:

print orders.groupby('OrderID')['start_time'].agg(np.min).iloc[:5]

I get:

OrderID
O161101XVS100000044   2016-11-01 12:03:12.920000-04:00
O161101XVS100000047   2016-11-01 12:03:36.693000-04:00
O161101XVS100000098   2016-11-01 12:09:08.330000-04:00
O161101XVS100000122   2016-11-01 12:09:59.950000-04:00
O161101XVS100000152   2016-11-01 12:11:29.790000-04:00
Name: start_time, dtype: datetime64[ns, US/Eastern]

Where the raw data had times closer to 8 am (US/Eastern). In other words, it reverted back to UTC times, even though it says it's eastern times, and has UTC-4 offset.

But if I instead try:

print orders.groupby('OrderID')['start_time'].agg(lambda x: np.min(x)).iloc[:5]

I now get:

OrderID
O161101XVS100000044   2016-11-01 08:03:12.920000-04:00
O161101XVS100000047   2016-11-01 08:03:36.693000-04:00
O161101XVS100000098   2016-11-01 08:09:08.330000-04:00
O161101XVS100000122   2016-11-01 08:09:59.950000-04:00
O161101XVS100000152   2016-11-01 08:11:29.790000-04:00
Name: start_time, dtype: datetime64[ns, US/Eastern]

Which is the behavior I intended. This second method is vastly slower, and I would have assumed the two approaches would yield identical results ...

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
  • looks like a bug can you check if this has been reported already on https://github.com/pandas-dev/pandas/issues and if not, post a new issue – EdChum Feb 16 '17 at 15:12

1 Answers1

1

I can confirm this behavior. The problem is in pandas/types/cast/_possibly_downcast_to_dtype(). The calculations are done as an i8, and afterwards are converted back into a timezone aware datetime. But this line:

result = to_datetime(result).tz_localize(dtype.tz)

Needs to be this:

result = to_datetime(result).tz_localize('utc')
result = result.tz_convert(dtype.tz)

Update:

I have submitted a PR to address this issue.

Update 2:

PR has been merged, and should be in 0.20.0

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135