0

I am looking for a more efficient Regular Expression for a QT Project.

Date Strings to which the following regular expression matches:

8 June 2009
2009-06-08
1/31/1971
ca. 1971
circa 1971
about 1971
abt 1971
before 1971
bef 1971
bante 1971
after 1971
aft 1971
apost 1971
between 1970 and 2000
bet 1970 and 2000
btw 1970 and 2000
1970-2000
from 1970 to 2000
est 2000
2000 or 2001
2000 | 2001
2019
1.10.2019
June 2009
6/2019

The expression:

^((?:Jan|January|Feb|February|Mar|March|Apr|April|May|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)[\t ]+[0-9]{4})|^((?:3[01]|[12][0-9]|0?[1-9])[\t ]+(?:Jan|January|Feb|February|Mar|March|Apr|April|May|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)[\t ]+[0-9]{4})|^([0-9]{4} or [0-9]{4})|(^(?:1[0-2]|0?[1-9])[\/.-][0-9]{4})|(^[0-9]{4}-[0-9]{4})|(^from[\t ]+[0-9]{4}[\t ]+to[\t ]+[0-9]{4})|(^btw[\t ]+[0-9]{4}[\t ]+and[\t ]+[0-9]{4})|(^between[\t ]+[0-9]{4}[\t ]+and[\t ]+[0-9]{4})|(^[\t\n\v\f\r ]*([0-9]{4})$)|(^[0-9]{4}[\t ]+\|[\t ]+[0-9]{4}\b)|(^(?:1[0-2]|0?[1-9])[\/.-](?:3[01]|[12][0-9]|0?[1-9])[\/.-][0-9]{4})|(^est[\t ]+[0-9]{4})|(^after[\t ]+[0-9]{4})|(^before[\t ]+[0-9]{4})|(^bef[\t ]+[0-9]{4})|(^abt[\t ]+[0-9]{4})|(^about[\t ]+[0-9]{4})|(^circa[\t ]+[0-9]{4})|(^ca[^\n\r][\t ]+[0-9]{4})|(^bante[\t ]+[0-9]{4})|(^aft[\t ]+[0-9]{4})|(^apost[\t ]+[0-9]{4})|(^btw[\t ]+[0-9]{4}[\t ]+and[\t ]+[0-9]{4})|(^bet[\t ]+[0-9]{4}[\t ]+and[\t ]+[0-9]{4})|(^[0-9]{4}-(?:1[0-2]|0?[1-9])-(?:3[01]|[12][0-9]|0?[1-9]))

Verified the resulting matches https://rubular.com/r/boBqo3h0KBt7oARubular Rebex

The only solution that I can find executes with alternating groups. Is this a reasonable approach to match to a list of string dates?

Thanks in advance!

Jarod42
  • 203,559
  • 14
  • 181
  • 302
Tim P
  • 69
  • 1
  • 9
  • First off, _bravo_, that is a beast of a pattern. If the engine is not doing backtracking, alternating groups aren't that costly. If performance becomes problematic, consider using a regex engine that can generate DFAs like [RE2](https://github.com/google/re2). But if what you have just works, why mess with it? – parktomatomi Nov 08 '19 at 01:53
  • 1
    If you are looking for speed you could have a fast match that isolates elements that are _probably a date_ and only apply a rigorous regex to those. – Galik Nov 08 '19 at 01:58
  • Is it possible to specify some kind of general pattern that wraps all of this? Something like `01/10/2009-10/10/2009` would be able to catch intervals and single days and you wouldn't need this big pattern. – Daniel Nov 08 '19 at 16:17

1 Answers1

0

I will commit to the RegEx as posted and target focused RegEx as an additional check where necessary. Thanks for your comments.

Tim P
  • 69
  • 1
  • 9