0

I have the requirement to match strings in a C++ code of the form

L, N{1, 3}, N{1, 3}, N{1, 3} 

where in the above pseudo-code, L is always a letter (upper or lower case) or a fullstop (. character) and N is always numeric [0-9].

So explicitly, we might have B, 999, 999, 999 or ., 8, 8, 8 but the number of numeric characters is always the same after each , and is either 1, 2 or 3 digits in length; so D, 23, 232, 23 is not possible.

In C# I would match this as follows

string s = "   B,801, 801, 801 other stuff";
Regex reg = new Regex(@"[\.\w],\s*\d{1,3},\s*\d{1,3},\s*\d{1,3}");
Match m = reg.Match(s);

Great. However, I need a similar regex using boost::regex. I have attempted

std::string s = "   B,801, 801, 801 other stuff";
boost::regex regex("[\\.\w],\s*\d{1,3},\s*\d{1,3},\s*\d{1,3}");
boost::match_results<std::string::const_iterator> results;
boost::regex_match(s, results, regex);

but this is giving me 'w' : unrecognized character escape sequence and the same for s and d. But from the documentation I was under the impression I can use \d, \s and \w without issue.

What am I doing wrong here?


Edit. I have switched to std::regex as-per a comment above. Now, presumably the regex is the same and the following compiles but the regex does not match...

std::string p = "XX";
std::string s = "    B,801, 801, 801 other stuff";
std::regex regex(R"del([\.\w],\s*\d{1,3},\s*\d{1,3},\s*\d{1,3})del");
if (std::regex_match(s, regex))
   p = std::regex_replace(s, regex, "");
MoonKnight
  • 23,214
  • 40
  • 145
  • 277
  • 2
    C++ has escape characters, just like C# does. It also has raw string literals, just like C# has verbatim strings. For what good it does, C++ also has a standard regular expressions library. – ghostofstandardspast Jun 18 '14 at 16:03

1 Answers1

1

You can use \w, \s, and \d in your regular expressions. However, that's not what you're doing; you're trying to use \w as a character in the string. For there to be a \ followed by a w in the actual string, you need to escape the \ (same for s and d, of course):

boost::regex regex("[\\.\\w],\\s*\\d{1,3},\\s*\\d{1,3},\\s*\\d{1,3}");

As of C++11, you can use raw string literals to make your code even more similar to the C# version:

boost::regex regex(R"del([\.\w],\s*\d{1,3},\s*\d{1,3},\s*\d{1,3})del");
Eric Finn
  • 8,629
  • 3
  • 33
  • 42
  • Thanks for your answer. I have ran the regex using `boost` and `std::regex` but neither match for the above regex... – MoonKnight Jun 18 '14 at 16:57
  • @Killercam Does replacing `[\\.\\w]` with `[\\._[:alnum:]]` (with `std::regex`) work for you? – Eric Finn Jun 18 '14 at 17:00
  • No, if I have `std::string s = " B,801, 801, 801 other stuff";` and `std::regex regex("\\w");` then `std::regex_match(s, regex)` returns false!? – MoonKnight Jun 18 '14 at 17:15
  • @Killercam Correct. [`std::regex_match`](http://en.cppreference.com/w/cpp/regex/regex_match) attempts to match an entire string. I think what you want is [`std::regex_search`](http://en.cppreference.com/w/cpp/regex/regex_search) – Eric Finn Jun 18 '14 at 17:20