0

I am trying to clean a spreadsheet of user-inputted data that includes a "birth_date" column. The issue I am having is that the date formating ranges widely between users, including inputs without markers between the date, month, and year. I am having a hard time developing a formula that is intelligent enough to interpret such a wide range of inputs. Here is a sample:

1/6/46
7/28/99
11272000
11/28/78

Here is where I started:

df['birth_date']=pd.to_datetime(df.birth_date)

This does not seem to make it past the first example, as it looks for a two-month format. Can anyone help with this?

rob.earwaker
  • 1,244
  • 1
  • 7
  • 14

1 Answers1

0

Your best bet is to check each input and give a consistent output. Assuming Month-Day-Year formats, you can use this function

import pandas as pd
import re

def fix_dates(dates):
    new = []
    for date in dates:
        chunks = re.split(r"[\/\.\-]", date)
        if len(chunks) == 3:
            m, d, y = map(lambda x: x.zfill(2), chunks)
            y = y[2:] if len(y) == 4 else y
            new.append(f"{m}/{d}/{y}")
        else:
            m = date[:2]
            d = date[2:4]
            y = date[4:]
            y = y[2:] if len(y) == 4 else y
            new.append(f"{m}/{d}/{y}")
    return new

inconsistent_dates = '1/6/46 7/28/99 11272000 11/28/78'.split(' ')

pd.to_datetime(pd.Series(fix_dates(inconsistent_dates)))

0   2046-01-06
1   1999-07-28
2   2000-11-27
3   1978-11-28
dtype: datetime64[ns]
Ali Shannon
  • 186
  • 3
  • Thanks, this was helpful. It is now stuck on an input format with a single-digit month and two-digit day. So 4062001 will return 40/62/001. Presumably, if I solve this error, another will arise. The best way would be to standardize the input on the front end, but I don't have that access. – Reid Haynie Sep 03 '20 at 14:09
  • I think the best way to solve the issue, is if the month is greater than 12 to take the year and assign it to January 1st. We really just need to verify the year they were born in, rather than their exact birthdate. – Reid Haynie Sep 03 '20 at 15:45