2

I am using Boost.Regex(boost-1.42) to remove the first line of a multi-line string(a fairly large string containing multiple lines ending in '\n').

i.e. using regex_replace to do something akin to s/(.*?)\n//

  string
  foo::erase_first_line(const std::string & input) const
  {
    static const regex line_expression("(.*?)\n");
    string  empty_string;

    return boost::regex_replace(input,
                                line_expression,
                                empty_string,
                                boost::format_first_only);
  }

This code is throwing the following exception:

"terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >'
  what():  The complexity of matching the regular expression exceeded predefined bounds.  Try refactoring the regular expression to make each choice made by the state machine unambiguous.  This exception is thrown to prevent "eternal" matches that take an indefinite period time to locate."

Interestingly/annoyingly, this doesn't seem to happen in test programs with the same test data. Any thoughts on why this could be happening and/or how to fix it?

Cœur
  • 37,241
  • 25
  • 195
  • 267
decimus phostle
  • 1,040
  • 2
  • 13
  • 28
  • 1
    Why do you need `?` after `.*`? – Maxim Egorushkin Feb 11 '11 at 16:11
  • 1
    To make the ".*" part match as little as possible. "." will match newlines by default, so without the "?" the pattern would slurp nearly the whole string (up to the last newline), not just the first line. – nobody Feb 11 '11 at 18:31
  • The reason is as Andrew mentioned. [Greedy vs. Non-greedy match.](http://www.troubleshooters.com/codecorn/littperl/perlreg.htm#Greedy) – decimus phostle Feb 11 '11 at 19:21

1 Answers1

2

Try putting a "beginning of string" marker ("\A" in the default Perl-compatible mode) at the beginning of the regex, to make it more explicit that you want it to match just the first line.

Without explicitly matching the beginning to the string, it looks like boost is applying its "leftmost longest" rule and that's what's causing this: http://www.boost.org/doc/libs/1_45_0/libs/regex/doc/html/boost_regex/syntax/leftmost_longest_rule.html

nobody
  • 19,814
  • 17
  • 56
  • 77
  • 1
    Duh! That works in my case, as I am looking for 'a first match' which is always going to be at the start of the text/string. Although it leaves me with two questions: (i) What if my 'first match' was not at the start of the string and I could not resort to '^' (ii) Why does this exception get thrown in one code context and not in another (the test program vs. the actual code) when the code and the test data were the same. Marking this as an answer, as it solved my problem. I modified the regex to "^(.*?)\n". – decimus phostle Feb 11 '11 at 19:28
  • Sorry to resurrect this from a decade ago but I'm having the exact problem you describe in this comment haha if I solve it I'll post here – Simeon Sep 06 '22 at 08:53