2

I'm having a string like

"<firstname>Anna</firstname>"

or

"<firstname>Anna Lena</firstname>"

and I want to use Regex to get the name out of it (so only "Anna" or "Anna Lena"). Currently I'm using:

std::regex reg1 ("(<firstname>)([a-zA-Z0-9]*)(</firstname>)");

and

std::regex_replace (std::back_inserter(result), input.begin(), input.end(), reg1, "$2");

which works well with only one name, but apparently it misses anything after that because it doesn't consider whitespaces. Now I've tried adding \s like ((([a-zA-Z0-9]*)|\s)*) but my IDE (Qt) tells me, that that \s is an unknown escape sequence. Right now, "<firstname>Anna Lena</firstname>" results in "<firstname>Anna".

How do I solve this in an elegant way?

Saftkeks
  • 181
  • 2
  • 15
  • `std::regex reg1("()([a-zA-Z0-9\\s]*)()");` or `std::regex reg1(R"(()([a-zA-Z0-9\s]*)())");` – Wiktor Stribiżew Jun 19 '16 at 16:23
  • Results remain the same... – Saftkeks Jun 20 '16 at 07:40
  • [Here](https://ideone.com/fHdqn7) it is working well. I removed the capturing groups from the firstname tags, and got the `match[1]` Group 1 value. Why do you mention Qt while you are using `std::regex`? What exactly are you using? Please share the full relevant code – Wiktor Stribiżew Jun 20 '16 at 07:50
  • Could you please show the whole relevant code? I think the problem is not just with the regex. I suggest using `regex_search` or `regex_match`, but I guess you have a vector of strings, and you want to modify this vector. – Wiktor Stribiżew Jun 20 '16 at 09:07

2 Answers2

0

Use a reluctant quantifier for dot:

std::regex reg1 ("<firstname>(.*?)</firstname>");

Alternately, you can use "not a right angle":

std::regex reg1 ("<firstname>[^<]*</firstname>");

Note that I removed the unnecessary groups around the tag literals, so the target is now group 1 (your regex captured it in group 2).

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • `std::regex reg1 ("(.*?)");` grabs "Anna" and nothing if there's only one name; same goes for "not a right angle". Maybe I'm doing something wrong that's very basic...? – Saftkeks Jun 20 '16 at 07:37
  • @saf here's a [live demo](http://rubular.com/r/6ve4mPqxXH) of the regex capturing "Anna Lena" as group 1. – Bohemian Jun 20 '16 at 11:43
0

It seems to me you have an issue with the back_converter in a regex_replace that inserts new elements automatically at the end of the container.

I suggest adding \s to the character class and matching the strings instead of reassigning the vector strings.

Here is a demo of my approach:

#include <iostream>
#include <regex>
using namespace std;

int main() {
    std::vector<std::string> strings;
    strings.push_back("<firstname>Anna</firstname>");
    strings.push_back("<firstname>Anna Lena</firstname>");
    std::regex reg("(<firstname>)([a-zA-Z0-9\\s]*)(</firstname>)");
    for (size_t k = 0; k < strings.size(); k++)
    {
        smatch s;
        if (std::regex_match(strings[k], s, reg)) {
            strings[k] = s[2];
            std::cout << strings[k] << std::endl;
        }
    }
    return 0;
}

Output:

Anna
Anna Lena
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563