4

I've been using this PDF Compare tool (ExamDiff Pro) and I'm trying to figure out how to exclude any words that match a potential date. The particular date format on the document I am comparing uses something like: "January 20 , 2014"

Could someone help me figure out the regex for this?

I've found results to similar questions, but they were just different enough for me to not be able to figure it out :/

Thanks!

David Rush
  • 41
  • 1
  • 2

3 Answers3

6

I'm not sure how your tool works, but here's one that should find exactly what you want with the sample you provided:

\w{3,9}?\s\d{1,2}?\s,\s\d{4}?


Part 1: \w{3,9}? -- This finds a word character sequence between 3 and 9 characters long as few times as possible (short=May(3), long=September(9))
Part 2. \s -- this is just what is called "whitespace" or a blank space, if you will.
Part 3: \d{1,2}? -- This finds a digit sequence (0-9) as few times a once and as many times as twice as few times as possible (meant for the 1-31 range)
Part 4: \s,\s -- this finds a whitespace, followed by a comma and then another whitespace
Part 5: \d{4}? -- this finds a sequence of 4 digits as few times as possible (year 1000-2014 and beyond)

Is that sufficient for what you were looking for?

kayleeFrye_onDeck
  • 6,648
  • 5
  • 69
  • 80
3

I've never used ExamDiff, but looking at the regex help page from them I think I can help.

I think the following regex should get you dates in the format you specified.

\w+\s\d{2},\s\d{4}

Explanation:

\w+    -- Find one or more word characters
\s     -- a white space character
\d{2}  -- 2 digits
,      -- a literal comma
\s     -- another space
\d{4}  -- 4 digits
Nathan
  • 1,437
  • 12
  • 15
0

You can try ^[0][0-9]([1][0-2])$ for checking month from 01 to 12

Mayur Bhavsar
  • 61
  • 1
  • 2