1

I tried:

df["datetime_obj"] = df["datetime"].apply(lambda dt: datetime.strptime(dt, "%d/%m/%Y %H:%M"))

but got this error:

ValueError: time data '10/11/2006 24:00' does not match format '%d/%m/%Y %H:%M'

How to solve it correctly?

jpp
  • 159,742
  • 34
  • 281
  • 339
ScalaBoy
  • 3,254
  • 13
  • 46
  • 84

3 Answers3

1

The reason why this does not work is because the %H parameter only accepts values in the range of 00 to 23 (both inclusive). This thus means that 24:00 is - like the error says - not a valid time string.

I think therefore we have not much other options than convert the string to a valid format. We can do this by first replacing 24:00 with 00:00, and then later increment the day for these timestamps.

Like:

from datetime import timedelta
import pandas as pd

df['datetime_zero'] = df['datetime'].str.replace('24:00', '0:00')
df['datetime_er'] = pd.to_datetime(df['datetime_zero'], format='%d/%m/%Y %H:%M')
selrow = df['datetime'].str.contains('24:00')
df['datetime_obj'] = df['datetime_er'] + selrow * timedelta(days=1)

The last line thus adds one day to the rows that contain 24:00, such that '10/11/2006 24:00' gets converted to '11/11/2006 24:00'. Note however that the above is rather unsafe since depending on the format of the timestamp this will/will not work. For the above it will (probably) work, since there is only one colon. But if for example the datetimes have seconds as well, the filter could get triggered for 00:24:00, so it might require some extra work to get it working.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • Cool, but I also get this warning. Do you know what it means? /opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy – ScalaBoy Oct 07 '18 at 13:52
  • @ScalaBoy: yes, what is not clear about the warning? – Willem Van Onsem Oct 07 '18 at 13:54
  • @ScalaBoy: by multiplying instead of filtering. But unfortunately this can result in an extra ineffeciency, since you already seem to use `.apply(..)` the performance is probably already not that huge :) – Willem Van Onsem Oct 07 '18 at 14:01
  • Nice explanation. I feel you shouldn't *have* to replace `'24:00'` and *also* check `str.contains('24:00')` as separate steps. You can use a Boolean series and reuse it. I attempted this in my answer. – jpp Oct 07 '18 at 14:20
  • @jpp: I use a boolean series. In fact all the above are building a *single* expression tree. – Willem Van Onsem Oct 07 '18 at 14:21
  • @jpp: well I think a problem with "changing" the row is that it is less declarative. Since declarative programming is typically safer than imperative programming, I prefer to not "alter" variables, only construct new ones :) – Willem Van Onsem Oct 07 '18 at 14:23
  • 1
    @WillemVanOnsem, Understood, guess this is more stylistic than anything. – jpp Oct 07 '18 at 14:24
1

Your data doesn't follow the conventions used by Python / Pandas datetime objects. There should be only one way of storing a particular datetime, i.e. '10/11/2006 24:00' should be rewritten as '11/11/2006 00:00'.

Here's one way to approach the problem:

# find datetimes which have '24:00' and rewrite
twenty_fours = df['strings'].str[-5:] == '24:00'
df.loc[twenty_fours, 'strings'] = df['strings'].str[:-5] + '00:00'

# construct datetime series
df['datetime'] = pd.to_datetime(df['strings'], format='%d/%m/%Y %H:%M')

# add one day where applicable
df.loc[twenty_fours, 'datetime'] += pd.DateOffset(1)

Here's some data to test:

dateList = ['10/11/2006 24:00', '11/11/2006 00:00', '12/11/2006 15:00']
df = pd.DataFrame({'strings': dateList})

Result after transformations described above:

print(df['datetime'])

0   2006-11-11 00:00:00
1   2006-11-11 00:00:00
2   2006-11-12 15:00:00
Name: datetime, dtype: datetime64[ns]
jpp
  • 159,742
  • 34
  • 281
  • 339
0

As indicated in the documentation (https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior), hours go from 00 to 23. 24:00 is then an error.

Matthieu Brucher
  • 21,634
  • 7
  • 38
  • 62
  • Ok, thanks. But how can I solve the problem? Is it possible to reveal these cases and substitute 24 by 0? – ScalaBoy Oct 07 '18 at 13:31
  • It's a classic form of bad data. You have two options. Either you fix them by fixing what is generating your data (the best option) or you fix your data by changing say `' 24:'` by `' 00:'` (simple thing to do in Python). – Matthieu Brucher Oct 07 '18 at 13:44