3

I am reading an HTML page and trying to retrieve a specific string within it.

I have the following code:

    std::string str = test.GetString(); // someString ( the string i have checked it, it's basically an html page )
    std::smatch match;
    std::regex re("0x(\\d|[A-Z]).*0000"); // the pattern I'm searching for
    if (std::regex_search(str, match, re)){
        test = "found"; // found gets printed
    }
    TRACE("%s\n",match[0]); // this outputs some garbage like this '˜ò'

I want to print/store the result of the match found but I get some garbage instead.

Disclaimer: I'm new to C++ regex. I might be doing a basic mistake

Mr.C64
  • 41,637
  • 14
  • 86
  • 162
Rana
  • 1,675
  • 3
  • 25
  • 51

2 Answers2

4
std::smatch match;
...
TRACE("%s\n",match[0]); // this outputs some garbage like this '˜ò'

The %s type specifier in the TRACE macro expects a raw C string pointer (char* in ANSI/MBCS builds; wchar_t* in Unicode builds - I'm assuming you are doing an ANSI/MBCS build here.).

But match[0] is not a raw C string pointer.

So you have a mismatch between what you promised to TRACE via %s (i.e. a raw C string char* pointer), and what you are actually passing to it (i.e. match[0]).

According to some online documentation, std::smatch is a specialization of the std::match_results template, in particular:

smatch --> match_results<string::const_iterator>

smatch::operator[] (which you are invoking in your code as match[0]) returns a reference to another object, which is a std::sub_match. This std::sub_match class represents a pair of iterators, denoting sequences of matched characters.

So, you are promising to TRACE to pass a raw C string pointer (via the %s type specifier), but you are actually passing a completely different thing, i.e. a reference to a std::sub_match object (via your match[0] code): no wonder that the printed text is meaningless.

What you have to do is to obtain a C string pointer from the match[0] expression.

To do that, you can invoke the std::sub_match's str() method. This returns a std::string object.

However, this std::string object is not exactly what %s expects: in fact, %s represents a raw C string pointer (e.g. const char*), not a std::string instance.

So, the last step is to extract this raw C string pointer from the std::string object, and this is done by invoking the std::string::c_str() method.

To summarize these logical steps:

std::smatch match;
...
match[0]               --> reference to std::sub_match object
match[0].str()         --> std::string object
match[0].str().c_str() --> raw C string pointer (const char*)

So, your TRACE statement can be written as:

TRACE("%s\n", match[0].str().c_str());
Mr.C64
  • 41,637
  • 14
  • 86
  • 162
  • Wow, This is the perfect answer! I understand it completely now and I want to hug you! Thank you so much for this! – Rana Feb 11 '16 at 19:28
  • Just wondering, can I tell `TRACE` to expect a `std::string`? instead of `%s` – Rana Feb 11 '16 at 19:41
  • @Rana: No, because `TRACE` is based on **C-style** `%s`, instead `std::string` is a **C++ class**, not a C-style raw string pointer. There is no implicit "conversion" from `std::string` to `%s`: you must call the `std::string`'s **`c_str()`** method to get the C-style `const char*` from the `std::string` object (and pass it according to `%s`). – Mr.C64 Feb 11 '16 at 19:44
2

The problem here is that match[0] returns an object of type sub_match, which is simply a pair of iterators. If that first argument to the TRACE macro is a C-style format specifier, convert the sub_match object to a C string like this:

TRACE("%s\n", std::string(match[0]).c_str());

That is, use sub_match's operator string() to get a (temporary) C++ string object, then call its member function c_str() to get a (temporary) C string object.

Pete Becker
  • 74,985
  • 8
  • 76
  • 165
  • Even though it works, i'm not sure what is going on. could please explain a bit more? – Rana Feb 11 '16 at 18:58
  • I have checked the value of `match[0]` in the debugger and it has a value of `157`. What does this represents, the object id number? – Rana Feb 11 '16 at 19:01
  • I have no idea what that value means. It sounds like the debugger is confused. `std::match_results` is, more or less, an array of `sub_match` objects. `match[0]` is a `sub_match` object that points to the text that matched all of the regular expression. That is, it holds a pair of iterators; the first points at the start of the matched sequence, and the second points past-the-end of the matched sequence. So you could construct a `string` object like this: `std::string m(match[0].first, match[0].second);`. It's easier to use the conversion operator that `sub_match` provides. – Pete Becker Feb 11 '16 at 19:04
  • that makes sense. last question (thank you soo much for your time). Why use the `c_str` after? – Rana Feb 11 '16 at 19:07