For this dataframe (link):
Issue date Redemption date
0 14-Jan-20 14-Jan-21
1 31-Jul-20 29-Jan-21
2 14-Feb-20 12-Feb-21
3 31-Aug-20 26-Feb-21
4 1-Sep-10 1-Mar-21
I want to convert columns [Issue date, Redemption date] to datetime format.
The format variations are:
- Month is lower case: 01-set-09
- Month in Italian 01-set-09 (settembre-italian instead of September-English)
I have tried using:
df['Issue date'] = pd.to_datetime(df['Issue date'], infer_datetime_format=True)
and this one:
pd.to_datetime(df['Issue date'], format='%d-%b-%y')
however i get the following error:
ParserError: Unknown string format: 01-set-09
Here is a working code that generates my error:
import pandas as pd
import numpy as np
#setup dateframe
url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv"
skiprows=4
df = pd.read_csv(url, sep=None, skiprows=skiprows)
print (df.dtypes)
# convert date wont work
df['Issue date'] = pd.to_datetime(df['Issue date'], infer_datetime_format=True)
#also tried this ad wont work
pd.to_datetime(df['Issue date'], format='%d-%b-%y')
EDIT FROM MrFuppes answer
when running the code suggested (I am using GoogleColab):
import locale
import pandas as pd
url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv"
df = pd.read_csv(url, skiprows=4)
locale.setlocale(locale.LC_TIME, "en_us") # try English first
df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce')
locale.setlocale(locale.LC_TIME, "it_it") # we also have Italian...
m = df['datetime'].isnull() # where English did not work
df.loc[m, 'datetime'] = pd.to_datetime(df['Issue date'][m], format='%d-%b-%y', errors='coerce')
print(df['datetime'].isnull().value_counts())
# False 238
# True 2 # these two were already NaN on import
I get the following error:
Error Traceback (most recent call last)
<ipython-input-1-be04fb51ee5f> in <module>()
5 df = pd.read_csv(url, skiprows=4)
6
----> 7 locale.setlocale(locale.LC_TIME, "en_us") # try English first
8 df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce')
9
/usr/lib/python3.7/locale.py in setlocale(category, locale)
606 # convert to string
607 locale = normalize(_build_localename(locale))
--> 608 return _setlocale(category, locale)
609
610 def resetlocale(category=LC_ALL):
Error: unsupported locale setting