I'm doing data cleaning and found there are different formats in the year column: e.g. 2011, 2012-2013, 2010-14. How to correct these errors and show only the latest year in cell, i.e. 2011, 2013, 2014.
I tried the below codes. It works for '2012-2013', the dataset is updated to 2013, but for '2010-14', the output is '0-14' instead of '2014'. How to fix it? Thanks.
def clean_year(year):
if len(year) == 4:
return year
elif '-' in year:
start, end = year.split('-')
if len(end) == 2:
return ('20'+end)
else:
return end.strip()
dataset1['Year'] = dataset1['Year'].apply(clean_year)