ValueError: time data '10/11/2006 24:00' does not match format '%d/%m/%Y %H:%M'

Question

I tried:

df["datetime_obj"] = df["datetime"].apply(lambda dt: datetime.strptime(dt, "%d/%m/%Y %H:%M"))

but got this error:

ValueError: time data '10/11/2006 24:00' does not match format '%d/%m/%Y %H:%M'

How to solve it correctly?

@WillemVanOnsem: These values come in the input data, which I should preprocess. Is there any way to overcome this problem? — ScalaBoy, Oct 07 '18 at 13:30
well what does `10/11/2006 24:00` means? That it is `11/11/2006 00:00`? Because that really makes it more *tricky*. — Willem Van Onsem, Oct 07 '18 at 13:31
bit does it mean we "increment" the day? Is `24:00` the *start* or the *end* of the day? — Willem Van Onsem, Oct 07 '18 at 13:32
@WillemVanOnsem: Ok, I see. I would assume that it's the start of the day. So, `10/11/2006 24:00`should be `11/11/2006 00:00` — ScalaBoy, Oct 07 '18 at 13:33

Willem Van Onsem · Accepted Answer · 2018-10-07T14:20:16.653

1

The reason why this does not work is because the %H parameter only accepts values in the range of 00 to 23 (both inclusive). This thus means that 24:00 is - like the error says - not a valid time string.

I think therefore we have not much other options than convert the string to a valid format. We can do this by first replacing 24:00 with 00:00, and then later increment the day for these timestamps.

Like:

from datetime import timedelta
import pandas as pd

df['datetime_zero'] = df['datetime'].str.replace('24:00', '0:00')
df['datetime_er'] = pd.to_datetime(df['datetime_zero'], format='%d/%m/%Y %H:%M')
selrow = df['datetime'].str.contains('24:00')
df['datetime_obj'] = df['datetime_er'] + selrow * timedelta(days=1)

The last line thus adds one day to the rows that contain 24:00, such that '10/11/2006 24:00' gets converted to '11/11/2006 24:00'. Note however that the above is rather unsafe since depending on the format of the timestamp this will/will not work. For the above it will (probably) work, since there is only one colon. But if for example the datetimes have seconds as well, the filter could get triggered for 00:24:00, so it might require some extra work to get it working.

edited Oct 07 '18 at 14:20

answered Oct 07 '18 at 13:48

Willem Van Onsem

443,496
30
428
555

Cool, but I also get this warning. Do you know what it means? /opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy – ScalaBoy Oct 07 '18 at 13:52
@ScalaBoy: yes, what is not clear about the warning? – Willem Van Onsem Oct 07 '18 at 13:54
@ScalaBoy: by multiplying instead of filtering. But unfortunately this can result in an extra ineffeciency, since you already seem to use `.apply(..)` the performance is probably already not that huge :) – Willem Van Onsem Oct 07 '18 at 14:01
Nice explanation. I feel you shouldn't *have* to replace `'24:00'` and *also* check `str.contains('24:00')` as separate steps. You can use a Boolean series and reuse it. I attempted this in my answer. – jpp Oct 07 '18 at 14:20
@jpp: I use a boolean series. In fact all the above are building a *single* expression tree. – Willem Van Onsem Oct 07 '18 at 14:21
@jpp: well I think a problem with "changing" the row is that it is less declarative. Since declarative programming is typically safer than imperative programming, I prefer to not "alter" variables, only construct new ones :) – Willem Van Onsem Oct 07 '18 at 14:23
1

@WillemVanOnsem, Understood, guess this is more stylistic than anything. – jpp Oct 07 '18 at 14:24

jpp · Answer 2 · 2018-10-07T14:23:00.920

Your data doesn't follow the conventions used by Python / Pandas datetime objects. There should be only one way of storing a particular datetime, i.e. '10/11/2006 24:00' should be rewritten as '11/11/2006 00:00'.

Here's one way to approach the problem:

# find datetimes which have '24:00' and rewrite
twenty_fours = df['strings'].str[-5:] == '24:00'
df.loc[twenty_fours, 'strings'] = df['strings'].str[:-5] + '00:00'

# construct datetime series
df['datetime'] = pd.to_datetime(df['strings'], format='%d/%m/%Y %H:%M')

# add one day where applicable
df.loc[twenty_fours, 'datetime'] += pd.DateOffset(1)

Here's some data to test:

dateList = ['10/11/2006 24:00', '11/11/2006 00:00', '12/11/2006 15:00']
df = pd.DataFrame({'strings': dateList})

Result after transformations described above:

print(df['datetime'])

0   2006-11-11 00:00:00
1   2006-11-11 00:00:00
2   2006-11-12 15:00:00
Name: datetime, dtype: datetime64[ns]

score 0 · Answer 3 · answered Oct 07 '18 at 12:51

0

As indicated in the documentation (https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior), hours go from 00 to 23. 24:00 is then an error.

answered Oct 07 '18 at 12:51

Matthieu Brucher

21,634
7
38
62

Ok, thanks. But how can I solve the problem? Is it possible to reveal these cases and substitute 24 by 0? – ScalaBoy Oct 07 '18 at 13:31
It's a classic form of bad data. You have two options. Either you fix them by fixing what is generating your data (the best option) or you fix your data by changing say `' 24:'` by `' 00:'` (simple thing to do in Python). – Matthieu Brucher Oct 07 '18 at 13:44

ValueError: time data '10/11/2006 24:00' does not match format '%d/%m/%Y %H:%M'

3 Answers3

Linked