0

This works fine in all online regex testers but fails to produce any matches in boost::regex_match, which I unfortunately must use as is because it is being used in a system that expects this format for more complicated parsings of street names.

std::string rformat = "(([a-zA-Z]*)|([0-9]*))?";
std::string source = "NASIONAL12";
const boost::regex piecesRegex(rformat);
boost::smatch      piecesMatch;
if (boost::regex_match(source, piecesMatch, piecesRegex))
{
   for (auto match : piecesMatch) {
       std::cerr << "MATCH:" << match << std::endl;
   }
}

What I need is for the first "piecesMatch" to return "NASIONAL" and the second "piecesMatch" to return "12"

StainlessSteelRat
  • 364
  • 1
  • 2
  • 16
  • `boost::regex_match` requires a whole string match. Your regex shows 3 matches in the string, so you probably just need to match all pattern occurrences. Check [Boost C++ regex - how to return all matches](https://stackoverflow.com/questions/16665981/boost-c-regex-how-to-return-all-matches) – Wiktor Stribiżew Feb 10 '22 at 21:16
  • Thank you, but how do I do this while still using regex_match? This same system is already used to return multiple matches for more complex strings like "US N-101 Hwy" (with a different regex of course) to break it down into US, N, and 101 – StainlessSteelRat Feb 10 '22 at 21:30
  • I don't understand why you have the 'or' `|` specifier between the `([a-zA-Z]*)` and `([0-9]*)` atoms. Is that intentional? – G.M. Feb 10 '22 at 21:31
  • It's my attempt to say: create matches for portions of the string that are either all alphabetic or purely numeric in order to split it into "NASIONAL" and "12" – StainlessSteelRat Feb 10 '22 at 21:33
  • You appear to want to capture *both* atoms as groups rather than just one or the other so try removing the `|`. It'll get you a lot closer to what you want. – G.M. Feb 10 '22 at 21:48
  • Yes thank you I removed the | but now I only get one match and it's for the entire string. – StainlessSteelRat Feb 10 '22 at 21:52
  • Actually with this regex: "(([a-zA-Z]*)([0-9]*))?" I get 4 matches for some reason: MATCH:NASIONAL1 MATCH:NASIONAL1 MATCH:NASIONAL MATCH:1 – StainlessSteelRat Feb 10 '22 at 21:54
  • 1
    So, it is just `std::string rformat = "([a-zA-Z]+)([0-9]+)";` or `std::string rformat = "([a-zA-Z]*)([0-9]*)";` – Wiktor Stribiżew Feb 10 '22 at 22:15
  • Yup that's what I found thanks to yalls input! – StainlessSteelRat Feb 10 '22 at 22:16
  • You might want to use `boost::regex_search` instead. With that you can still control the begin and end of string or substrings as needed. – sln Feb 10 '22 at 23:54

1 Answers1

0

Thanks to everyone's help I found the right regex string:

([a-zA-Z]*)?([0-9]*)?
StainlessSteelRat
  • 364
  • 1
  • 2
  • 16
  • In fairness, it seems like you're trying to parse addresses. This may require a lot more. I'd look at existing tech (Google may have a library e.g.) or actual tokenizing and parsing. You will probably find that address layout varies wildly around the globe though. – sehe Feb 10 '22 at 22:20
  • 1
    Remember that in `([a-zA-Z])?([0-9])?` there are 2 groups, each is optional, each only match 1 character at most. And the least is neither match anything in which a match will still occur. Are you sure that's what you need ? – sln Feb 10 '22 at 23:47
  • Hello yall, it's not full addresses just essentially highway names so I can pull specific values out of them. For the question from sln, looks like the markup language removed the * * in the queries and turned it into italics. I'll update the answer – StainlessSteelRat Feb 11 '22 at 03:59