0

I have hourly data for an entire year stored in a Pandas column that's an object dtype. The data was imported from a .CSV file with the following structure:

Date/Time,kWh
 01/01  01:00:00,1.14168620105289
 01/01  02:00:00,0.998495769210657
 01/01  03:00:00,0.949679309420898
 01/01  04:00:00,0.938080118507197
 01/29  20:00:00,1.14161727165962
 01/29  21:00:00,1.01263083086978
 01/29  22:00:00,0.961652730472469
 01/29  23:00:00,0.951211299856564
 01/29  24:00:00,0.949390070561629

So the Date/Time column includes month, day, hours, minutes, and seconds. I'm trying to (1) convert that column to DateTime and (2) set the year as 2019.

I'm running into a problem with the conversion because for some strange reason, rather than the normal 24-hour range of 00:00 to 23:59, the group that coded the .CSV file choose hours ranging from 01:00 to 24:00. So when I run the following command:

cons['Date/Time'] =  pandas.to_datetime(cons['Date/Time'], format=' %m/%d  %H:%M:%S')

I get the following error:

ValueError: time data ' 01/01  24:00:00' does not match format ' %m/%d  %H:%M:%S' (match)

I'm looking for assistance converting the column to DateTime and setting the data's year as 2019. Any help would be appreciated.

  • To clarify, is `24:00` midnight (`00:00`) and all the other hours are incorrect (and if so, which day is it midnight of), or are all hours shifted by 1 (`01:00` is midnight)? – Oliver.R Mar 30 '20 at 03:23
  • This answer explains how to convert `24:00` to `00:00` --> https://stackoverflow.com/questions/52688645/valueerror-time-data-10-11-2006-2400-does-not-match-format-d-m-y-hm/52689127#52689127 – dzakyputra Mar 30 '20 at 03:35
  • All hours are shifted by 1. The year starts at 1:00 on Jan. 1st and the last data point is 24:00 on Dec. 31st. – Christopher Worley Mar 30 '20 at 05:29

1 Answers1

0

Short Answer: I did some string manipulation to decrease all of the hours by one hour

df['temp_col'] = df['Date/Time,kWh'].str.split(':').str[0]
df['temp_col'] = (pd.to_numeric(df['temp_col']) - 1).astype(str)
df['temp_col'] = df['temp_col'].apply(lambda x: f'0{x}' if len(x)==1 else x)
df['temp_col'] = df['temp_col'] + df['Date/Time,kWh'].str[2:]

The output below isn't clean, because the data you posted didn't copy over well with pd.read_clipboard(), but it should give you an idea.

        Date/Time,kWh               temp_col
01/01   01:00:00,1.14168620105289   00:00:00,1.14168620105289
01/01   02:00:00,0.998495769210657  01:00:00,0.998495769210657
01/01   03:00:00,0.949679309420898  02:00:00,0.949679309420898

More detail: I took the first part of the string where the hours are and created a column converting those to an integer and subtracting 1. Then, I added the rest of the string to it. Per your question, that would be one way to get the hours to 0-23, rather than 1-24. It sounds like everything was an "hour later than it should be?". Otherwise, if you are looking to just to make the 24:00 = 00:00 (not sure which method is correct for your data) @dzakyputra commented a post. You could also do similar string manipulation to get 24:00 to 00:000, so hopefully this helps you solve.

If you wanted to make it a "one-liner" you could, but better to breakdown into steps.

df['Date/Time,kWh'] = (pd.to_numeric(df['Date/Time,kWh'].str.split(':').str[0])-1).astype(str).apply(lambda x: f'0{x}' if len(x)==1 else x) + df['Date/Time,kWh'].str[2:]

Output:

        Date/Time,kWh
01/01   00:00:00,1.14168620105289
01/01   01:00:00,0.998495769210657
01/01   02:00:00,0.949679309420898
David Erickson
  • 16,433
  • 2
  • 19
  • 35