0

For this dataframe (link):

    Issue date Redemption date
0    14-Jan-20       14-Jan-21
1    31-Jul-20       29-Jan-21
2    14-Feb-20       12-Feb-21
3    31-Aug-20       26-Feb-21
4     1-Sep-10        1-Mar-21

I want to convert columns [Issue date, Redemption date] to datetime format.

The format variations are:

  • Month is lower case: 01-set-09
  • Month in Italian 01-set-09 (settembre-italian instead of September-English)

I have tried using:

df['Issue date'] = pd.to_datetime(df['Issue date'], infer_datetime_format=True)

and this one:

pd.to_datetime(df['Issue date'], format='%d-%b-%y')

however i get the following error:

ParserError: Unknown string format: 01-set-09

Here is a working code that generates my error:

import pandas as pd 
import numpy as np 
 #setup dateframe 
url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv"
skiprows=4
df = pd.read_csv(url, sep=None, skiprows=skiprows)
print (df.dtypes)

 # convert date wont work 
df['Issue date'] = pd.to_datetime(df['Issue date'], infer_datetime_format=True)
 #also tried this ad wont work 
pd.to_datetime(df['Issue date'], format='%d-%b-%y')

EDIT FROM MrFuppes answer

when running the code suggested (I am using GoogleColab):

import locale
import pandas as pd

url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv"
df = pd.read_csv(url, skiprows=4)

locale.setlocale(locale.LC_TIME, "en_us") # try English first
df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce')

locale.setlocale(locale.LC_TIME, "it_it") # we also have Italian...
m = df['datetime'].isnull() # where English did not work
df.loc[m, 'datetime'] = pd.to_datetime(df['Issue date'][m], format='%d-%b-%y', errors='coerce')

print(df['datetime'].isnull().value_counts())
# False    238
# True       2 # these two were already NaN on import

I get the following error:

Error                                     Traceback (most recent call last)
<ipython-input-1-be04fb51ee5f> in <module>()
      5 df = pd.read_csv(url, skiprows=4)
      6 
----> 7 locale.setlocale(locale.LC_TIME, "en_us") # try English first
      8 df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce')
      9 

/usr/lib/python3.7/locale.py in setlocale(category, locale)
    606         # convert to string
    607         locale = normalize(_build_localename(locale))
--> 608     return _setlocale(category, locale)
    609 
    610 def resetlocale(category=LC_ALL):

Error: unsupported locale setting
Leo
  • 1,176
  • 1
  • 13
  • 33

2 Answers2

1

There are some another months names, seems like italian.

If need convert only english add errors='coerce', so another strings are converted to NaNs:

pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce')

Or you can replace italian susbtrings first by dictionary:

d = {'set': 'Sep', 'mag': 'Mar'}

s = df['Issue date'].replace(d, regex=True)
df['Issue date'] = pd.to_datetime(s, format='%d-%b-%y', errors='coerce')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, With your dateparser example, it returns an object column, how do i convert it to datetime column? – Leo Aug 20 '21 at 07:08
  • @Leo - Just tested, and seems wrng parsed 236 row, can you check it if for you working well? If yes, use `df['Issue date'] = pd.to_datetime(df['Issue date'].apply(conv), utc=True)` – jezrael Aug 20 '21 at 07:10
  • 1
    @jezrael I think `dateparser` gives incorrect results here; e.g. it parses `'01-set-16'` to `2021-01-16 00:00:00+01:00` which is invalid because the UTC offset of 1 hour isn't specified in the input. – FObersteiner Aug 20 '21 at 08:23
  • 1
    @MrFuppes - agree, so removed. – jezrael Aug 20 '21 at 08:30
1

If you know which languages to expect in advance, another option would be to set the appropriate locale. Side note, the names of the locales (i.e. the string to use for setting a specific one) depend on the operating system.

Example, Windows locale:

import locale
import pandas as pd

url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv"
df = pd.read_csv(url, skiprows=4)

locale.setlocale(locale.LC_TIME, "en_us") # try English first
df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce')

locale.setlocale(locale.LC_TIME, "it_it") # we also have Italian...
m = df['datetime'].isnull() # where English did not work
df.loc[m, 'datetime'] = pd.to_datetime(df['Issue date'][m], format='%d-%b-%y', errors='coerce')

print(df['datetime'].isnull().value_counts())
# False    238
# True       2 # these two were already NaN on import
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • when running your code (in COLAB) i am getting the following error (Full error added to quesiton): ----> 7 locale.setlocale(locale.LC_TIME, "en_us") # try English first ------ Error: unsupported locale setting - – Leo Aug 20 '21 at 08:47
  • 1
    @Leo sorry I forgot to add, the appropriate locale setting (i.e. string to use for a specific language) depends on the OS you run. My example works on Windows. On Unix systems, you'll need to adjust the settings. See also [this](https://stackoverflow.com/q/19709026/10197418). – FObersteiner Aug 20 '21 at 08:53
  • 1
    @Leo addendum: I think this won't work on Google Colab since you can't change the locale there (the only supported locale is en_US afaik). See also [this](https://stackoverflow.com/questions/67045349/change-locale-for-google-colab). – FObersteiner Aug 20 '21 at 09:13
  • Thanks, I've been trying with no luck. I can see the it_IT.UTF-8 listed when running ! sudo dpkg-reconfigure locales, but no idea if this means it is possible to change the locale or not. – Leo Aug 20 '21 at 09:18
  • found a way, doing random things but still it works: ! sudo locale-gen "it_IT.UTF-8" thanks! – Leo Aug 20 '21 at 09:23
  • @Leo glad it works for you! if you found a way to do this on Colab, maybe you can add an answer to the question I linked above? ok maybe not "do random things until it works" ^^ – FObersteiner Aug 20 '21 at 09:25