Recent pandas versions (e.g. 1.5.0) throws a helpful error that shows where the unparsable string is in the column. That said, a common way to solve this error is to pass the correct format=
especially if the dates have some "exotic" format (for a list of all possible formats, check https://strftime.org/) such as having underscores (_
), em dash (—
), unexpected white space as separators.
df = pd.DataFrame({'date': ['19_Mar_2017']})
pd.to_datetime(df['date']) # error
pd.to_datetime(df['date'], format='%d_%b_%Y') # OK
df = pd.DataFrame({'date': ['19.Mar. 2017']})
pd.to_datetime(df['date']) # error
pd.to_datetime(df['date'], format='%d.%b. %Y') # OK
If the datetimes have mixed formats, use the errors='coerce'
argument to pass different formats and combine all using fillna()
. For example,
df = pd.DataFrame({'date': ['19.Mar. 2017', '20.Mar.2017']}) # <--- has mixed format
first = pd.to_datetime(df['date'], format='%d.%b. %Y', errors='coerce')
second = pd.to_datetime(df['date'], format='%d.%b.%Y', errors='coerce')
df['date'] = first.fillna(second)
As an aside, if the data is imported from a file, pandas functions that read such files into dataframes have parse_dates=
parameter that can be used to parse the dates.
# from csv
df = pd.read_csv('foo.csv', parse_dates=True)
# from Excel
df = pd.read_excel('foo.xlsx', parse_dates=True)