1

I have a date string like Thursday, December 13, 2018 i.e., DAY, MONTH dd, yyyy and I need to validate it with a regular expression.

The regex should not validate incorrect day or month. For example, Muesday, December 13, 2018 and Thursday, December 32, 2018 should be marked invalid.

What I could do so far is write expressions for the ", ", "dd", and "yyyy". I don't understand how will I customize the regex in such a way that it would accept only correct day's and month's name.

My attempt:

^([something would come over here for day name]day)([\,]|[\, ])(something would come over here for month name)(0?[1-9]|[12][0-9]|3[01])([\,]|[\, ])([12][0-9]\d\d)$

Thanks.

EDIT: I have only included years starting from year 1000 - year 2999. Validating leap years does not matter.

Kshitiz
  • 3,431
  • 4
  • 14
  • 22
  • 1
    i would try to convert it to ```datetime```, and if succeeds, you won.. Regexes are much harder to read, understand, extend, debug.. Something like in: https://stackoverflow.com/questions/466345/converting-string-into-datetime – Aaron_ab Dec 13 '18 at 10:04
  • You should definitly parse the string as a date and check if it is valid. There are many libraries out there for just that (e.g. `dateFns`, `moment.js`, etc.). Otherwise, how would you validate complex use cases like 31. June or just a wrong weekday name by using regex? – ssc-hrep3 Dec 13 '18 at 10:25

2 Answers2

2

You can try a library that implements regex for "complex" case like yours. This is called datefinder.

This guy made the work for you to find any kind of date into texts:

https://github.com/akoumjian/datefinder

To install : pip install datefinder

import datefinder

string_with_dates = "entries are due by January 4th, 2017 at 8:00pm
    created 01/15/2005 by ACME Inc. and associates."

matches = datefinder.find_dates(string_with_dates)

for match in matches:
    print(match)

# Output
2017-01-04 20:00:00
2005-01-15 00:00:00

To detect wrong words like "Muesday" you you filter your text with an spellchecker like PyEnchant

import enchant
>>> d = enchant.Dict("en_US")
>>> print(d.check("Monday"))
True
>>> print(d.check("Muesday"))
False
>>> print(d.suggest("Muesday"))
['Tuesday', 'Domesday', 'Muesli', 'Wednesday', 'Mesdames']
snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
LaSul
  • 2,231
  • 1
  • 20
  • 36
1

regex is not the way to go to solve your problem!

But here is some example code where you could see how something would come over here for day name-section in your pattern could be written. I also added example of how to use strptime() that is a much better solution in your case:

import re
from datetime import datetime

s = """
Thursday, December 13, 2018
Muesday, December 13, 2018
Monday, January 13, 2018
Thursday, December 32, 2018
"""

pat = r"""
^
(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)\ 
([\,]|[\, ])\ 
(January|February|March|April|May|June|July|August|September|October|November|December)\ 
(0?[1-9]|[12][0-9]|3[01])
([\,]|[\, ])\ 
([12][0-9]\d\d)
$
"""

for match in re.finditer(pat, s, re.VERBOSE+re.MULTILINE):
    print match

for row in s.split('\n'):
    try:
        match = datetime.strptime(row, '%A, %B %d, %Y')
        print match
    except:
        print "'%s' is not valid"%row
UlfR
  • 4,175
  • 29
  • 45