0

I have a csv file of a years worth of time series data where the time stamp looks like the code insert below. One thing to mention about the data its a 30 year averaged hourly weather data, so there isnt a year specified with the time stamp.

Date
01-01T01:00:00
01-01T02:00:00
01-01T03:00:00
01-01T04:00:00
01-01T05:00:00
01-01T06:00:00
01-01T07:00:00
01-01T08:00:00
01-01T09:00:00
01-01T10:00:00
01-01T11:00:00
01-01T12:00:00
01-01T13:00:00
01-01T14:00:00
01-01T15:00:00
01-01T16:00:00
01-01T17:00:00
01-01T18:00:00
01-01T19:00:00
01-01T20:00:00
01-01T21:00:00
01-01T22:00:00
01-01T23:00:00

I can read the csv file just fine:

df = pd.read_csv('weather_cleaned.csv', index_col='Date', parse_dates=True)

If I do a pd.to_datetime(df) this will error out: ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing

Would anyone have any tips to convert my df to datetime?

bbartling
  • 3,288
  • 9
  • 43
  • 88

2 Answers2

2

You can pass date_parser argument (check docs), e.g.

import pandas as pd
from datetime import datetime

df = pd.read_csv('weather_cleaned.csv', index_col='Date', parse_dates=['Date'], 
                  date_parser=lambda x: datetime.strptime(x, '%d-%mT%H:%M:%S'))
print(df.head())

output

Empty DataFrame
Columns: []
Index: [1900-01-01 01:00:00, 1900-01-01 02:00:00, 1900-01-01 03:00:00, 1900-01-01 04:00:00, 1900-01-01 05:00:00]

of course you can define different function, maybe specify different year, etc.. e.g. if you want year 2020 instead of 1900 use

date_parser=lambda x: datetime.strptime(x, '%d-%mT%H:%M:%S').replace(year=2020)

Note I assume it's day-month format, change format string accordingly.

EDIT: Change my example to reflect that Date column should be used as index.

buran
  • 13,682
  • 10
  • 36
  • 61
  • Hi, would you know how to get around with? `ValueError: time data '01-13T00:00:00' does not match format '%d-%mT%H:%M:%S'` – bbartling May 07 '20 at 14:42
  • read my note - I assume day-month format. change the format string to `'%m-%dT%H:%M:%S'` – buran May 07 '20 at 14:46
1

One thing you can do is to append a default year:

 pd.to_datetime('2020-' + df['Date'])
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74