0
 df["Dt_Customer"] = pd.to_datetime(df["Dt_Customer"],format='%d-%m-%y')

I have tried to convert the date column, the dataset contains more than 1 million rows... I have to find the date rows which are not getting converted.

TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-124-d701d963ff8c> in <module>
 ----> 1 df["Dt_Customer"] = pd.to_datetime(df["Dt_Customer"],format='%d-%m-%y')

c:\users\dell\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py 
in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, 
origin, cache)
803             result = arg.map(cache_array)
804         else:
--> 805             values = convert_listlike(arg._values, format)
806             result = arg._constructor(values, index=arg.index, name=arg.name)
807     elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):

c:\users\dell\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py 
in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, 
yearfirst, exact)
458                 return DatetimeIndex._simple_new(dta, name=name)
459             except (ValueError, TypeError):
--> 460                 raise e
461 
462     if result is None:

c:\users\dell\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py 
in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, 
yearfirst, exact)
421             if result is None:
422                 try:
--> 423                     result, timezones = array_strptime(
424                         arg, format, exact=exact, errors=errors
425                     )

pandas\_libs\tslibs\strptime.pyx in pandas._libs.tslibs.strptime.array_strptime()

ValueError: unconverted data remains: 12
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
Thiru Balaji G
  • 163
  • 2
  • 10
  • Welcome back to Stack Overflow. To refresh your memory, please read [ask] and the [formatting help](https://stackoverflow.com/help/formatting), and then *ask a question* - in the body itself, not just your title. Make sure it is clear what exactly you need help with, and why you are not able to solve the problem yourself. – Karl Knechtel Sep 30 '21 at 07:35
  • Hi, I have tried to convert the date column using the pandas to_datetime method, during conversion some rows are not getting converted, i need to find what are the exact error rows ? – Thiru Balaji G Sep 30 '21 at 07:40
  • When someone tells you to read some things and explains how to fix the question, you should not try to ask your question again to that person. You should read the linked articles, and use that advice in order to edit and fix the post. – Karl Knechtel Sep 30 '21 at 07:44
  • New to StackOverflow! Thanks for showing a good way! – Thiru Balaji G Sep 30 '21 at 08:15

2 Answers2

2

an efficient solution would be to parse the date strings to datetime, with keyword errors set to 'coerce'. That will give NaT (not-a-time) for invalid strings. You can derive a boolean mask from that by calling .isnull() which you can then use to extract respective values.

Ex:

import pandas as pd

df = pd.DataFrame({"Dt_Customer": ["28-12-2020", "not a date"]})

invalid = df.loc[pd.to_datetime(df["Dt_Customer"],
                                format='%d-%m-%Y',
                                errors='coerce').isnull(), "Dt_Customer"]

print(invalid)
1    not a date
Name: Dt_Customer, dtype: object

Note that you can also omit the format keyword to make the check unspecific, i.e. accepting any date/time format the parser can parse.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
1

you could try a loop with try and except:

causing_error_list = []
for x in df["Dt_Customer"].values:
    try:
        pd.to_datetime(x,format='%d-%m-%y')
    except:
        causing_error_list.append(x)
Ezer K
  • 3,637
  • 3
  • 18
  • 34