I have a data frame:
print(df_test)
Name Birth Date
0 Anna B Wilson JUL 1861
1 Victor C Burnett NOV 1847
2 Ausia Burnett JUN 1898
3 Alfred Burnett MAR 1896
4 Viola Burnett AUG 1894
I would like the output to be:
Name Birth Date
0 Anna B Wilson 7-1861
1 Victor C Burnett 11-1847
2 Ausia Burnett 6-1898
3 Alfred Burnett 3-1896
4 Viola Burnett 8-1894
Is there a concise way for me to do this without writing a separate regex for each month, i.e.
df_test = df_test.replace(to_replace ='(MAR)\s(\d{4})', value = r'3-\2', regex = True)
df_test = df_test.replace(to_replace ='(JUN)\s(\d{4})', value = r'6-\2', regex = True)
df_test = df_test.replace(to_replace ='(JUL)\s(\d{4})', value = r'7-\2', regex = True)
df_test = df_test.replace(to_replace ='(AUG)\s(\d{4})', value = r'8-\2', regex = True)
df_test = df_test.replace(to_replace ='(NOV)\s(\d{4})', value = r'11-\2', regex = True)
print(df_test)
?
EDIT: So there is a fly in the ointment. The date data is not all in the same format. For example there are anomalies like those in rows 5-8:
Name Birth Date
0 Anna B Wilson JUL 1861
1 Victor C Burnett NOV 1847
2 Ausia Burnett JUN 1898
3 Alfred Burnett MAR 1896
4 Viola Burnett AUG 1894
5 Marinda Lynde 1843
6 Iola Staffen Jan Abt 1880
7 Maryella Dolores Staffin 30 AUG 1913
8 Norman Lawrence Schmitt 22 JUN 1945