I solved this recently when trying to match syntax similar to 1-4,5,9,20-25
. The resulting regular expression is, admittedly, not simple:
/\G([0-9]++)(?:-([0-9]++))?+(?:,(?=[-0-9,]+$))?+/
This expression allowed me to collect all of the matches in the string, incrementally.
We can apply the same approach to your problem, but it's extremely difficult to both validate and match your given input. (I don't know how to do it. If someone else does, I'd like to see!) But you can validate the input separately:
/\(\s*(\s*((\s*\d+\s+\d+\s*)\)\s*)+\s*\)/
See Evan's answer for how it works. \d
is equivalent to [0-9]
and \s
is equivalent to [\r\n\t ]
.
This is an incremental match to extract the numbers:
/\G\(?\s*(?:\(\s*(\d+)\s+(\d+)\s*\))(?:(?=\s*\(\s*\d+\s+\d+\s*\))|\s*\))/
It breaks down like so:
/\G # matches incrementally. \G marks the beginning of the string or the beginning of the next match.
\(?\s* # matches first open paren; safely ignores it and following whiespace if this is not the first match.
(?: # begins a new grouping - it does not save matches.
\(\s* # first subgroup open paren and optional whitespace.
(\d+) # matches first number in the pair and stores it in a match variable.
\s+ # separating whitespace
(\d+) # matches second number in the pair and stores it in a match variable.
\s*\) # ending whitespace and close paren
) # ends subgroup
(?: # new subgroup
(?= # positive lookahead - this is optional and checks that subsequent matches will work.
\s*\(\s*\d+\s+\d+\s*\) # look familiar?
) # end positive lookahead
| # if positive lookahead fails, try another match
\s*\)\s* # optional ending whitespace, close paren
)/ # ... and end subgroup.
I haven't tested this, but I'm confident it'd work. Each time you apply the expression to a given string, it'll extract each subsequent pair of numbers until it sees the last closing paren, and it'll consume the whole string, or stop if there's an input error. You may need to finesse it for Boost::regex. This is a Perl-compatible regular expression.