2

As part of a web service response, I'm having to extract date strings that so far have taken the either of these formats:

  • 06 Mar 2015-10:24 EST
  • 06 Mar 2015-10:24

(I've got no control over the service itself; there's a whole assortment of nonstandard date formats there plus [inaccurate] localization, so you'll have to trust me that in the context I need a regex.)

So far, I've been using the following pattern to pull out the bits and pieces I need:

@"(((\\d{1,2})\\s([a-z]+)\\s(\\d{4}))\\-(\\d+:\\d+))(\\s([a-z]{3}))?$"

However, yet another new format has been introduced, without the time:

  • 06 Mar 2015

This seemed like a simple modification. I created a new group around the hyphen+time ("-10:24") atoms, and added the "zero or one" quantifier ("?") to get this:

@"(((\\d{1,2})\\s([a-z]+)\\s(\\d{4}))(\\-(\\d+:\\d+))?)(\\s([a-z]{3}))?$"

But the expression now fails on all of the above input strings.

Interestingly, I've tried replacing the "?" with other quantifiers, and discovered any quantifier that suggests that at least one of those atoms should be present (e.g., (\\-(\\d+:\\d+))+, (\\-(\\d+:\\d+)){1,2}) works, whereas those that suggest even the possibility that it might not be there (e.g., (\\-(\\d+:\\d+))*, (\\-(\\d+:\\d+)){0,1}) fail.

I can come up with at least a couple of clumsy workarounds for this, but in the interest of clean code, am I…

  1. Messing up the regular expression? (I don't think so, I've tested this at regex101.com and it works.)
  2. Missing something in the NSRegularExpression documentation?
  3. Bumping into an actual bug in the class (in which case I'll go ahead and report it to Apple)?

Thanks.

RiqueW
  • 212
  • 1
  • 12
  • That's strange. [regex101](https://regex101.com/r/iY2wM1/1) is passing indeed (with case insensitivity), but it's not the same regex engine anyway. – Lucas Trzesniewski Mar 06 '15 at 16:39
  • This is just guessing since I'm not able to test it with NSRegularExpression, nor was I able to find any docs on it. But one thing you could try is something along the lines of `@"(?:(\\d{1,2})\\s([a-z]+)\\s(\\d{4}))((?:\\-\\d+:\\d+)?)((?:\\s[a-z]{3})?)$"`, more or less the same expression except it has a finite (albeit some potentially empty) number of capture groups . The idea being that NSRegularExpression doesn't like it when it doesn't know the number of groups there will be in the result. Not ideal since you capture more than you want, but you'd be able to test if this is the case. – rvalvik Mar 06 '15 at 17:15
  • Right you are, Lucas. I forgot to mention that I'm using the case-insensitive flag in my code and at regex101. – RiqueW Mar 06 '15 at 17:15
  • @rvalvik: That looks like an answer to me; why didn't you post it as such? Comments are meant to be used for quick questions or suggestions, usually about the question itself. – Alan Moore Mar 06 '15 at 17:42
  • Sorry, everyone… as with a lot of mysterious behaviour, this one didn't have anything to do with the line of code that seemed to be the culprit. The second input string was indeed working, but throwing an error when the code was trying to access substringWithRange: using one of the 'empty' ranges (with its location property set to LONG_MAX, it would seem). The error was getting handled(/ignored) by an overly forgiving try/catch block at a higher scope! – RiqueW Mar 06 '15 at 18:19
  • Sharing with anyone reading this thread: a developer friend has pointed out the the proper way to check for empty ranges in an NSTextCheckingResult is to test that `range.location != NSNotFound`. (Full confession: I'm a .NET developer only recently migrated to iOS.) – RiqueW Mar 06 '15 at 18:57

1 Answers1

0

The meaning of your new regex has changed: in the original regex, EST has been optional; however, it becomes mandatory in the new regex if the dash is discovered.

If you would like to make the time portion optional without changing the meaning of the rest of your expression, add an optional non-capturing group around the time portion, like this:

@"((\\d{1,2})\\s([a-z]+)\\s(\\d{4}))(?:-(\\d+:\\d+)(?:(\\s([a-z]{3}))?))?$"

I used a non-capturing group (?:...) to preserve group numbering from the original expression.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • dasblinkenlight: does this work on your end? I tried plugging it into my code, and the same bug seemed to be exposed. If I change the final "?" to "+", then it matches the first input above (but of course, neither of the others.) However, as is, it matches none of them. No matter what, though, I'm eternally grateful for being introduced to non-capturing groups. I can't believe I've been writing regex's for as long as I have without knowing what these are. Talk about clean output!! – RiqueW Mar 06 '15 at 17:36
  • @RiqueW I need to get to my Mac to try this out. I'll let you know what's happening. – Sergey Kalinichenko Mar 06 '15 at 17:39
  • dasblinkenlight: Don't bother. Please see my comment above. I'm accepting your answer, though, because A) it's an improvement on my pattern; B) your unintentional tip about non-capturing groups has definitely improved the legibility (quality, even?) of the surrounding code; C) I'm impressed that you were able to edit that pattern while not at your computer!! Thanks again. – RiqueW Mar 06 '15 at 18:22