Boost regex expression capture

Question

My goal is to capture an integer using boost::regex_search.

#define BOOST_REGEX_MATCH_EXTRA

#include <boost\regex.hpp>
#include <iostream>

int main(int argc, char* argv[])
{
  std::string tests[4] = {
    "SomeString #222",
    "SomeString #1",
    "SomeString #42",
    "SomeString #-1"
  };

  boost::regex rgx("#(-?[0-9]+)$");

  boost::smatch match;

  for(int i=0;i< 4; ++i)
  {
    std::cout << "Test " << i << std::endl;

    boost::regex_search(tests[i], match, rgx, boost::match_extra);

    for(int j=0; j< match.size(); ++j)
    {
      std::string match_string;
      match_string.assign(match[j].first, match[j].second);
      std::cout << "    Match " << j << ": " << match_string << std::endl;
    }
  }

  system("pause");
}

I notice that each regex search results in two matches. The first being the string matched, and the second is the capture in parenthesis.

Test 0
    Match 0: #222
    Match 1: 222
Test 1
    Match 0: #1
    Match 1: 1
Test 2
    Match 0: #42
    Match 1: 42
Test 3
    Match 0: #-1
    Match 1: -1

The documentation discourages use of BOOST_REGEX_MATCH_EXTRA unless needed. Is it required to capture a single match within parentheses, or is there another way?

Would be good if you could link to the specific part of the documentation that mentions that `BOOST_REGEX_MATCH_EXTRA`'s use is discouraged. There is another way, but I would discourage that other way much more than using parentheses (if performance is the reason)! — Jerry, Apr 14 '14 at 17:37
@Jerry "that other way" might well be much more performant. Did you profile it for the OP's situation? — sehe, Apr 14 '14 at 19:10
@sehe The 'other' way (using regex alone) is *always* less performant, that's the trade between functionality and performance. — Jerry, Apr 14 '14 at 19:13

sehe · Answer 1 · 2014-04-14T19:55:08.283

If you want more speed, perhaps Boost Spirit could bring it, or other Boost Xpressive.

Both will generate code from expression templates. Meaning, among other things, that if you don't "absorb" any attribute values, no cost will be incurred.

Boost Spirit:

This solution is header-only. It can probably be made more efficient, but here's a start:

#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;

int main()
{
    std::string const tests[] = {
        "SomeString #222",
        "SomeString #1",
        "SomeString #42",
        "SomeString #-1"
    };

    for(auto& input : tests)
    {
        int value;
        auto f(input.begin()), l(input.end());
        if (qi::phrase_parse(f, l,  // input iterators
                    qi::omit [ *~qi::char_('#') ] >> '#' >> qi::int_, // grammar
                    qi::space,      // skipper
                    value))         // output attribute
        {
            std::cout << "     Input '" << input << "' -> " << value << "\n";
        }
    }
}

See it Live On Coliru

Boost Xpressive

#include <boost/xpressive/xpressive_static.hpp>
#include <iostream>
namespace xp = boost::xpressive;

int main()
{
    std::string const tests[] = {
        "SomeString #222",
        "SomeString #1",
        "SomeString #42",
        "SomeString #-1"
    };

    for(auto& input : tests)
    {
        static xp::sregex rex = (xp::s1= -*xp::_) >> '#' >> (xp::s2= !xp::as_xpr('-') >> +xp::_d);

        xp::smatch what;

        if(xp::regex_match(input, what, rex))
        {
            std::cout << "Input '" << what[0] << " -> " << what[2] << '\n';
        }
    }
}

See it Live On Coliru too.

I have a hunch that the Spirit solution is gonna be more performant, and close to what you want (because it parses a general grammar and parses it into your desired data-type directly).

Added Boost Xpressive solution, in addition to the Boost Spirit one. — sehe, Apr 14 '14 at 19:55

Boost regex expression capture

1 Answers1

Boost Spirit:

Boost Xpressive