58

I have a pandas.DataFrame called df which has an automatically generated index, with a column dt:

df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))

What I'd like to do is create a new column truncated to hour precision. I'm currently using:

df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))

This works, so that's fine. However, I've an inkling there's some nice way using pandas.tseries.offsets or creating a DatetimeIndex or similar.

So if possible, is there some pandas wizardry to do this?

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
Jon Clements
  • 138,671
  • 33
  • 247
  • 280

3 Answers3

124

In pandas 0.18.0 and later, there are datetime floor, ceil and round methods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use:

>>> df['dt2'] = df['dt'].dt.floor('h')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00

Here's another alternative to truncate the timestamps. Unlike floor, it supports truncating to a precision such as year or month.

You can temporarily adjust the precision unit of the underlying NumPy datetime64 datatype, changing it from [ns] to [h]:

df['dt'].values.astype('<M8[h]')

This truncates everything to hour precision. For example:

>>> df
                       dt
0     2014-10-01 10:02:45
1     2014-10-01 13:08:17
2     2014-10-01 17:39:24

>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00

>>> df.dtypes
dt     datetime64[ns]
dt2    datetime64[ns]

The same method should work for any other unit: months 'M', minutes 'm', and so on:

  • Keep up to year: '<M8[Y]'
  • Keep up to month: '<M8[M]'
  • Keep up to day: '<M8[D]'
  • Keep up to minute: '<M8[m]'
  • Keep up to second: '<M8[s]'
Alex Riley
  • 169,130
  • 45
  • 262
  • 238
  • 4
    Great answer. Waaay faster than `datetime.replace`, which is the most common solution mentioned on SO. – Def_Os Apr 14 '16 at 21:44
  • My dt series had a millisecond term (+00:00) that I wanted to get rid of. First part of your answer seems to round (take terms to zero) but not truncate. Second part of your answer with astype did the trick. Thanks! – Ben Holmquist Nov 01 '19 at 20:02
  • for minute frequency all the three: 'm', 'M' and ' – caped114 Jun 02 '20 at 11:55
  • Super helpful ty! Midnight hours (00:00:00) did not show up for me in `df.head()` so I thought there was something wrong after applying `df['hour'].dt.floor('h')`. But there was nothing wrong, checking `df['hour'].loc[0]` revealed the correct hourly truncation. – callpete Mar 02 '21 at 02:30
  • As a note, to do something like "truncate to every six hours" you can use `.dt.floor('6H')` See [combining-aliases](https://pandas.pydata.org/docs/user_guide/timeseries.html#combining-aliases) in the docs. – sql_knievel Jul 15 '22 at 14:59
3

A method I've used in the past to accomplish this goal was the following (quite similar to what you're already doing, but thought I'd throw it out there anyway):

df['dt2'] = df['dt'].apply(lambda x: x.replace(minute=0, second=0))
aseagram
  • 1,201
  • 14
  • 18
David Hagan
  • 1,156
  • 12
  • 23
1

Alternatively:

df.dt.index.to_period("h") # for the period
df.dt.index.to_period("h").to_timestamp() # for the timestamp truncated

would be the least ambiguous (more pythonic?) way to achieve this. Using floor/round/ceil for coarser round (months, years...) you would get an error message

ValueError: <YearEnd: month=12> is a non-fixed frequency

See discussion here: https://github.com/pandas-dev/pandas/issues/15303

Adav
  • 428
  • 4
  • 18