0

I am cleaning some data working on Pandas and have one column year containing two formats of years 19 and 2019.

I am currently using pd.to_datetime(chelsea["Year"], format="%y")

it is giving me errors. Now how can I convert this column into a single format like 2019

ERROR is ValueError: unconverted data remains: 93

  • Where does `93` come from? Please include a simple repro. – jarmod Jan 02 '23 at 20:14
  • What is the output of `chelsea["Year"].unique()`? – mozway Jan 02 '23 at 20:18
  • A total of 958 rows, so 93 may be unable to change – Muhammad Anas Atiq Jan 02 '23 at 20:25
  • array(['93', '1993', '1994', '94', '95', '1995', '1996', '96', '1997', '97', '1998', '98', '1999', '99', '2000', '00', '2001', '01', '2002', '02', '2003', '03', '2004', '04', '2005', '05', '2006', '06', '2007', '07', '2008', '08', '2009', '09', '10', '2010', '2011', '11', '2012', '12', '2013', '13', '2014', '14', '2015', '15', '2016', '16', '2017', '17', '2018', '18'], dtype=object) These are different years. – Muhammad Anas Atiq Jan 02 '23 at 20:26

1 Answers1

1

Have a look here: Handling multiple datetime formats with pd.to_datetime

In short, you can use errors=coerce to fill any years not matching a pattern with NaT. If you repeat that for all patterns you have in the data, you can then combine the different results together:

import pandas as pd

ser = pd.Series(["2019", "73", "2020", "01", "13", "1998", "99"])

one = pd.to_datetime(ser, format="%Y", errors="coerce")
two = pd.to_datetime(ser, format="%y", errors="coerce")

out = one.fillna(two)

out:

0   2019-01-01
1   1973-01-01
2   2020-01-01
3   2001-01-01
4   2013-01-01
5   1998-01-01
6   1999-01-01
dtype: datetime64[ns]

you can convert those back to just the year:

years = out.dt.strftime("%Y")
Chrysophylaxs
  • 5,818
  • 3
  • 10
  • 21