3

I am trying to write a REGEX for identifying date as per (British: day-month-year) using python.

I have written some random strings as given below.

string='these are just rubbish 01-13- 00-00- 44-44- 11-2-2222 24-3-1695abc 12-13-1111 32/11/2000\
        these are actual dates -- 4-02-2011 12/12/1990 31-11-1690  11 July 1990 7 Oct 2012\
        these are actual deal-- by 12 December six people died and  by 18 Nov 19902.00 dollar was spent\
        anomalies -- are he gave June 2000 bucks in 5 July. The shares rose 5% on 5 November 1999.'

re.findall('(\
([1-9]|0[1-9]|[12][0-9]|3[01])\
[-/\s+]\
(1[1-2]|0[1-9]|[1-9]|Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|\
Aug|August|Sept|September|Oct|October|Nov|November|Dec|December)\
[-/\s+]\
(1[0-9]\d\d|20[0-2][0-5])\
[^\da-zA-Z])', string)

The output I get is given below:

[('2/11/2000 ', '2', '11', '2000'),
 ('4-02-2011 ', '4', '02', '2011'),
 ('12/12/1990 ', '12', '12', '1990'),
 ('31-11-1690 ', '31', '11', '1690'),
 ('11 July 1990 ', '11', 'July', '1990'),
 ('7 Oct 2012 ', '7', 'Oct', '2012'),
 ('5 November 1999.', '5', 'November', '1999')]

The regex format seems working, however, there are few dates the regex is unable to identify:

by **12 December** six people
by **18 Nov** 19902.00 dollar

How can I modify the regex so that it identifies the above dates too.

Steffi Keran Rani J
  • 3,667
  • 4
  • 34
  • 56
Sam
  • 2,545
  • 8
  • 38
  • 59

2 Answers2

1

it seems your Regular Expression is only recognizing dates including YEAR.

Change the rules having optional year part. (the whole part other 'December' or 'Nov')

1

What you're asking is to make the year optional. So you should surround your year part [-/\s+](1[0-9]\d\d|20[0-2][0-5]) with an optional non-capturing group:

(?:[-/\s+](1[0-9]\d\d|20[0-2][0-5]))?

Also, it's matching 2/11/2000 which is part of a 'rubbish' date on your first line. Start the regex off with a \b to make sure it's starting on a word boundary.

benshepherd
  • 635
  • 7
  • 18
  • Hi Benshepherd, I have posted a small problem on the solution you provided and another case scenario in a different post. you can find it http://stackoverflow.com/questions/33145399/pyhton-regex-to-handle-different-types-of-date-written. Please provide some insights. – Sam Oct 15 '15 at 09:57
  • I think that's really part of the same problem. I think it's going to get closed as a duplicate, and you should modify this question to pose both problems. – benshepherd Oct 15 '15 at 10:00