0

I was trying to convert a date stored in my dataframe to DateTime format. The column i'm trying to convert has dates stored in mm/dd/yy format.

This is the script i used to convert:

df['dt'] = pd.to_datetime(df['dt'], format = '%d-%m-%Y')

The script runs without an error converting the dates accurately even-though the format provided is not correct.

My question is why the script didn't throw an error when wrong format is provided?

sheeni
  • 377
  • 1
  • 17
  • 1
    Because your data does not contain dates with day > 12? My system throws an error with `'01-31-2019'`. – Quang Hoang Apr 01 '20 at 14:27
  • Just tried the same in a dummy df and it throw an error. Turns out when i create the date frame from the excel file pandas already store the column as datetime. does pd.read_excel infer non standard date types? – sheeni Apr 01 '20 at 14:37
  • @sheeni Excel stores dates internally as floats with an indicator that the cell contains a date value. When pandas reads that sheet it uses that information about the cell and converts that float into a datetime object – Jon Clements Apr 01 '20 at 14:43
  • 1
    @sheeni pandas does automatically infer data types from column values in any of its input functions. – Vishnudev Krishnadas Apr 01 '20 at 14:44
  • @Vishnudev thank you! – sheeni Apr 01 '20 at 14:49

2 Answers2

0

Consider the date 1-2-2020. Now just by looking at the date can you say exactly what date it is? The answer is no, because, unless you know how the date is formatted or how the date was created i.e whether Day-Month-Year or Month-Day-Year, you can't really say whether the above date is 1st February 2020 or 2nd January 2020. So, the key here is verifying the dataset and it's origins. There are multiple intuition techniques that you can apply to your data, like, if the data is originated from the United States, the common date format is MM/DD/YYYY or if India it is DD-MM-YY.

SAMPLE

>>> import pandas as pd
>>> df = pd.DataFrame({'dt': ['1-1-2020', '15-2-2020', '3-24-2020']})
>>> df
          dt
0   1-1-2020
1  15-2-2020
2  3-24-2020

CODE - Throws error as expected

>>> pd.to_datetime(df['dt'], format = '%d-%m-%Y')
Traceback (most recent call last):
  File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 448, in _convert_listlike_datetimes
    values, tz = conversion.datetime_to_datetime64(arg)
  File "pandas/_libs/tslibs/conversion.pyx", line 200, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/util/_decorators.py", line 208, in wrapper
    return func(*args, **kwargs)
  File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 778, in to_datetime
    values = convert_listlike(arg._values, True, format)
  File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 451, in _convert_listlike_datetimes
    raise e
  File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 416, in _convert_listlike_datetimes
    arg, format, exact=exact, errors=errors
  File "pandas/_libs/tslibs/strptime.pyx", line 142, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data '3-24-2020' does not match format '%d-%m-%Y' (match)
Vishnudev Krishnadas
  • 10,679
  • 2
  • 23
  • 55
0

The code below it works for me:

df['date'] = pd.to_datetime(df['date'], format = '%d-%m-%Y', unit='ns')

or

df['date'] = pd.to_datetime(df['date'], format = '%d-%m-%Y')
df['date'] = pd.to_datetime(df.date, unit='ns')
Fariliana Eri
  • 181
  • 2
  • 5