I'm a newbie and sure this is something silly in my code. In my defense I've tried re-reading through the Python RE documentation here before asking and searching around but don't see a duplicate question so far (which surprised me.)
Outside of a DataFrame I have my re working example here:
x = 'my best friend's birthday is 24 Jan 2001.'
print(re.findall('\d{1,2}\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*\s\d{2,4}', x))
<Anaconda console returns:> 24 Jan 2001
But in my Dataframe (df1
) I have the following:
index text
0 My birthday is 2/21/19
1 Your birthday is 4/1/20
2 my best friend's birthday is 24 Jan 2001.
When I run the following code:
df1['dates'] = df1['text'].str.extract('.*?(\d+[/-]\d+[/-]?\d*).*?|\d{1,2}\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*\s\d+')
print('df1['dates'])
I get the following results:
dates
0 2/21/19
1 4/1/20
2 NaN
I've tried to play around with the parenthesis, rereading the documentation, and some other tweaks that just resulted in endless errors. I'm sure it's an obvious mistake, but I don't see it. Can someone help? Thank you.