3

My situation: I'm new to Spirit, I have to use VC6 and am thus using Spirit 1.6.4.

I have a line that looks like this:

//The Description;DESCRIPTION;;

I want to put the text DESCRIPTION in a string if the line starts with //The Description;.

I have something that works but looks not that elegant to me:

vector<char> vDescription; // std::string doesn't work due to missing ::clear() in VC6's STL implementation
if(parse(chars,
    // Begin grammar
    (
       as_lower_d["//the description;"]
    >> (+~ch_p(';'))[assign(vDescription)]
    ),
    // End grammar
    space_p).hit)
{
    const string desc(vDescription.begin(), vDescription.end());
}

I would much more like to assign all printable characters up to the next ';' but the following won't work because parse(...).hit == false

parse(chars,
        // Begin grammar
        (
           as_lower_d["//the description;"]
        >> (+print_p)[assign(vDescription)]
        >> ';'
        ),
        // End grammar
        space_p).hit)

How do I make it hit?

foraidt
  • 5,519
  • 5
  • 52
  • 80

2 Answers2

3

You're not getting a hit because ';' is matched by print_p. Try this:

parse(chars,
    // Begin grammar
    (
       as_lower_d["//the description;"]
    >> (+(print_p-';'))[assign(vDescription)]
    >> ';'
    ),
    // End grammar
    space_p).hit)
Fred Larson
  • 60,987
  • 18
  • 112
  • 174
  • Thanks, I will try this tomorrow. It looks like there is a fundamental misunderstanding on my side then. I assumed the parser would try to match things if possible and not be so lazy... Do you know what the term for this behaviour is? – foraidt Jan 21 '09 at 22:52
  • I think the term is "greedy". See http://www.boost.org/doc/libs/1_35_0/libs/spirit/doc/faq.html#greedy_rd – Fred Larson Jan 21 '09 at 22:57
3

You might try using confix_p:

confix_p(as_lower_d["//the description;"],
         (+print_p)[assign(vDescription)],
         ch_p(';')
        )

It should be equivalent to Fred's response.

The reason your code fails is because print_p is greedy. The +print_p parser will consume characters until it encounters the end of the input or a non-printable character. Semicolon is printable, so print_p claims it. Your input gets exhausted, the variable is assigned, and the match fails — there's nothing left for the last semicolon of your parser to match.

Fred's answer constructs a new parser, (print_p - ';'), which matches everything print_p does, except for semicolons. "Match everything except X, and then match X" is a common pattern, so confix_p is provided as a shortcut for constructing that kind of parser. The documentation suggests using it for parsing C- or Pascal-style comments, but that's not required.

For your code to work, Spirit would need to recognize that the greedy print_p matched too much and then backtrack to allow matching less. But although Spirit will backtrack, it won't backtrack to the "middle" of what a sub-parser would otherwise greedily match. It will backtrack to the next "choice point," but your grammar doesn't have any. See Exhaustive backtracking and greedy RD in the Spirit documentation.

Community
  • 1
  • 1
Rob Kennedy
  • 161,384
  • 21
  • 275
  • 467
  • Thanks, this works, too and looks even better. I have a small correction though: To get only the text _in_between_ confix_p's opening and closing, the [assign()] action has to be put behind the print_p instead of behind confix_p(). – foraidt Jan 22 '09 at 08:57
  • Ah, you're right. I skimmed over the documentation too fast. It looks wrong at first, but the parser fixes it to do the right thing. – Rob Kennedy Jan 22 '09 at 13:55