I'm trying to match dates in a dataframe with 500 entries using regex:
The dates can appear in the following formats:
04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009
Feb 2009; Sep 2009; Oct 2010
6/2008; 12/2009
2009; 2010
dates[dates[0].str.contains(r'(?P<year>\d?\d?\d\d)')].shape
returns a tuple of shape(500,1)
but
dates[dates[0].str.contains(r'((?P\<day\>(\d?\d)?(\s|-|/|th|st|nd)?)??P\<year\>(\d?\d?\d\d))')].shape
returns a tuple of shape(0,1)
, but the day group is optional, so shouldnt it still match the year group.