I am trying to write a regex to identify some dates.
the string I am working on is :
string:
'these are just rubbish 11-2-2222, 24-3-1695-194475 12-13-1111, 32/11/2000\
these are dates 4-02-2011, 12/12/1990, 31-11-1690, 11 July 1990, 7 Oct 2012\
these are actual deal- by 12 December six people died and in June 2000 he told, by 5 July 2001, he will leave.'
The regex looks like :
re.findall('(\
[\b, ]\
([1-9]|0[1-9]|[12][0-9]|3[01])\
[-/.\s+]\
(1[1-2]|0[1-9]|[1-9]|Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sept|September|Oct|October|Nov|November|Dec|December)\
(?:[-/.\s+](1[0-9]\d\d|20[0-2][0-5]))?\
[^\da-zA-Z])',String)
The output I get is :
[(' 11-2-', '11', '2', ''),
(' 24-3-1695-', '24', '3', '1695'),
(' 4-02-2011,', '4', '02', '2011'),
(' 12/12/1990,', '12', '12', '1990'),
(' 31-11-1690,', '31', '11', '1690'),
(' 11 July 1990,', '11', 'July', '1990'),
(' 7 Oct 2012 ', '7', 'Oct', '2012'),
(' 12 December ', '12', 'December', ''),
(' 5 July 2001,', '5', 'July', '2001')]
Problems:
The first two output are wrong, they come because of the optional expression
((?:[-/.\s+](1[0-9]\d\d|20[0-2][0-5]))?)
put to handle cases like"12 December"
. How do I get rid of them?There is a case
"June 2000"
that is not handles by the expression.
Can I implement something with the expression that could handle this case without affecting others?