-2

I have a data frame:

print(df_test)

               Name Birth Date
0     Anna B Wilson   JUL 1861
1  Victor C Burnett   NOV 1847
2     Ausia Burnett   JUN 1898
3    Alfred Burnett   MAR 1896
4     Viola Burnett   AUG 1894

I would like the output to be:

               Name Birth Date
0     Anna B Wilson     7-1861
1  Victor C Burnett    11-1847
2     Ausia Burnett     6-1898
3    Alfred Burnett     3-1896
4     Viola Burnett     8-1894

Is there a concise way for me to do this without writing a separate regex for each month, i.e.

df_test = df_test.replace(to_replace ='(MAR)\s(\d{4})', value = r'3-\2', regex = True)
df_test = df_test.replace(to_replace ='(JUN)\s(\d{4})', value = r'6-\2', regex = True)
df_test = df_test.replace(to_replace ='(JUL)\s(\d{4})', value = r'7-\2', regex = True)
df_test = df_test.replace(to_replace ='(AUG)\s(\d{4})', value = r'8-\2', regex = True)
df_test = df_test.replace(to_replace ='(NOV)\s(\d{4})', value = r'11-\2', regex = True)
print(df_test)

?

EDIT: So there is a fly in the ointment. The date data is not all in the same format. For example there are anomalies like those in rows 5-8:

                       Name    Birth Date
0             Anna B Wilson      JUL 1861
1          Victor C Burnett      NOV 1847
2             Ausia Burnett      JUN 1898
3            Alfred Burnett      MAR 1896
4             Viola Burnett      AUG 1894
5             Marinda Lynde          1843
6              Iola Staffen  Jan Abt 1880
7  Maryella Dolores Staffin   30 AUG 1913
8   Norman Lawrence Schmitt   22 JUN 1945
fredh3
  • 13
  • 3

1 Answers1

0

You don't actually need regex, you can use pd.to_datetime() followed by strftime() to specify the desired format, for example:

test_df = pd.DataFrame({'Name':['A','B','C','D','E'],
                        'Birthdate':['JUL 1861', 'NOV 1847','JUN 1898','MAR 1896','AUG 1894']})
test_df['Birthdate'] = pd.to_datetime(test_df['Birthdate'],infer_datetime_format=True).dt.strftime('%m-%Y')

Outputs:

  Name Birthdate
0    A   07-1861
1    B   11-1847
2    C   06-1898
3    D   03-1896
4    E   08-1894
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • So this would work if data was all in the same format. Please see the edit. – fredh3 Jun 17 '20 at 21:32
  • In the future please make sure to contain all the useful data. Also, if there is only a year with no month, how can we infer the month? – Celius Stingher Jun 17 '20 at 21:34
  • You cannot. This is ancestry data. All that was known for that individual is their birth year. This is not data I compiled and is filled with many formatting inconsistencies which I am trying to fix. – fredh3 Jun 17 '20 at 21:37
  • I withheld the anomalies because I thought they would muddle up the specific task at hand. i.e. taking the entries like 'JAN 1941' and converting them to mo-year format. – fredh3 Jun 17 '20 at 21:44