1

I've got a dataframe with a column names birthdates, they are all strings, most are saved as %d.%m.%Y, some are saved as %d.%m.%y.

How can I make this work?

df["birthdates_clean"] = pd.to_datetime(df["birthdates"], format = "%d.%m.%Y")

If this can't work, do I need to filter the rows? How would I do it?

Thanks for taking time to answer!

Sevgisiz
  • 15
  • 3

1 Answers1

1

I am not sure what is the expected output, but you can let to_datetime parse automatically the dates:

df = pd.DataFrame({"birthdates": ['01.01.2000', '01.02.00', '02.03.99',
                                   '02.03.22', '01.01.71', '01.01.72']})
# as datetime
df["birthdates_clean"] = pd.to_datetime(df["birthdates"], dayfirst=True)

# as custom string
df["birthdates_clean2"] = (pd.to_datetime(df["birthdates"], dayfirst=True)
                             .dt.strftime('%d.%m.%Y')
                           )

NB. the shift point is currently at 71/72. 71 gets evaluated as 2071 and 72 as 1972

output:

   birthdates birthdates_clean birthdates_clean2
0  01.01.2000       2000-01-01        01.01.2000
1    01.02.00       2000-02-01        01.02.2000
2    02.03.99       1999-03-02        02.03.1999
3    02.03.22       2022-03-02        02.03.2022
4    01.01.71       2071-01-01        01.01.2071
5    01.01.72       1972-01-01        01.01.1972
mozway
  • 194,879
  • 13
  • 39
  • 75
  • 1
    Yes, thank you so much the setting "dayfirst = True" helped a lot! – Sevgisiz Mar 17 '22 at 09:48
  • I got to one more Problem though, as you also showed it in your example; the 71 is now a 2071. Is there any setting for "to_datetime" that would scale it down inbetween 100 years to today? So everything over 22 would be 19xx and everything under 22 would be 20xx? – Sevgisiz Mar 17 '22 at 11:02