Difficulty using boost::regex_match to get separate matches for "NASIONAL" and "12" from string "NASIONAL12"

Question

This works fine in all online regex testers but fails to produce any matches in boost::regex_match, which I unfortunately must use as is because it is being used in a system that expects this format for more complicated parsings of street names.

std::string rformat = "(([a-zA-Z]*)|([0-9]*))?";
std::string source = "NASIONAL12";
const boost::regex piecesRegex(rformat);
boost::smatch      piecesMatch;
if (boost::regex_match(source, piecesMatch, piecesRegex))
{
   for (auto match : piecesMatch) {
       std::cerr << "MATCH:" << match << std::endl;
   }
}

What I need is for the first "piecesMatch" to return "NASIONAL" and the second "piecesMatch" to return "12"

`boost::regex_match` requires a whole string match. Your regex shows 3 matches in the string, so you probably just need to match all pattern occurrences. Check [Boost C++ regex - how to return all matches](https://stackoverflow.com/questions/16665981/boost-c-regex-how-to-return-all-matches) — Wiktor Stribiżew, Feb 10 '22 at 21:16
Thank you, but how do I do this while still using regex_match? This same system is already used to return multiple matches for more complex strings like "US N-101 Hwy" (with a different regex of course) to break it down into US, N, and 101 — StainlessSteelRat, Feb 10 '22 at 21:30
I don't understand why you have the 'or' `|` specifier between the `([a-zA-Z]*)` and `([0-9]*)` atoms. Is that intentional? — G.M., Feb 10 '22 at 21:31
It's my attempt to say: create matches for portions of the string that are either all alphabetic or purely numeric in order to split it into "NASIONAL" and "12" — StainlessSteelRat, Feb 10 '22 at 21:33
You appear to want to capture *both* atoms as groups rather than just one or the other so try removing the `|`. It'll get you a lot closer to what you want. — G.M., Feb 10 '22 at 21:48
Yes thank you I removed the | but now I only get one match and it's for the entire string. — StainlessSteelRat, Feb 10 '22 at 21:52
Actually with this regex: "(([a-zA-Z]*)([0-9]*))?" I get 4 matches for some reason: MATCH:NASIONAL1 MATCH:NASIONAL1 MATCH:NASIONAL MATCH:1 — StainlessSteelRat, Feb 10 '22 at 21:54
So, it is just `std::string rformat = "([a-zA-Z]+)([0-9]+)";` or `std::string rformat = "([a-zA-Z]*)([0-9]*)";` — Wiktor Stribiżew, Feb 10 '22 at 22:15
You might want to use `boost::regex_search` instead. With that you can still control the begin and end of string or substrings as needed. — sln, Feb 10 '22 at 23:54

StainlessSteelRat · Answer 1 · 2022-02-11T03:59:59.803

0

Thanks to everyone's help I found the right regex string:

([a-zA-Z]*)?([0-9]*)?

edited Feb 11 '22 at 03:59

answered Feb 10 '22 at 22:09

StainlessSteelRat

364
1
2
16

In fairness, it seems like you're trying to parse addresses. This may require a lot more. I'd look at existing tech (Google may have a library e.g.) or actual tokenizing and parsing. You will probably find that address layout varies wildly around the globe though. – sehe Feb 10 '22 at 22:20
1

Remember that in `([a-zA-Z])?([0-9])?` there are 2 groups, each is optional, each only match 1 character at most. And the least is neither match anything in which a match will still occur. Are you sure that's what you need ? – sln Feb 10 '22 at 23:47
Hello yall, it's not full addresses just essentially highway names so I can pull specific values out of them. For the question from sln, looks like the markup language removed the * * in the queries and turned it into italics. I'll update the answer – StainlessSteelRat Feb 11 '22 at 03:59

Difficulty using boost::regex_match to get separate matches for "NASIONAL" and "12" from string "NASIONAL12"

1 Answers1