0

Higuys, I want to parsing a text using Byacc. The text is made clearly by spaces and new line. What do you think about these rules to parse one text?

text: /* empty string */ {$$ = "";} 
      |TEXT {$$ = $1;}
      |TEXT whitespace text {$$ = $1 + $2  + $3;}
      |TEXT line whitespace text {$$ = $1 + $2 + $4;}

The token TEXT is in the Jflex file, and it represents one single word. The other two rules, whitespace and line are down:

line : NL { $$ = System.lineSeparator(); }
      | line NL { $$ = $1 + System.lineSeparator(); }


 whitespace: WHITESPACE {$$ = " ";}
          |whitespace WHITESPACE {$$ = $1 + " ";}

Is my "text"'s rule wrong? Thaks

Seki
  • 11,135
  • 7
  • 46
  • 70

1 Answers1

2

No rule is "wrong" per se, a rule is what it is. The question is, does it do what you want it do? So what do you want it to do? What do you want to accept with your parser and what do you want to reject as a syntax error?

Your text rule is right recursive, so will require a lot of parser stack space (you'll push the entire input on to the stack, then reduce it right to left). Left recursive would be better, but if you need to do the reductions right to left for some reason, right recursive is fine. There's nothing in your actions which would seem to require right-to-left reductions as all they do is string concatenations, which are associative.

Your text rule does not allow for NL immediately follow by TEXT (or eof) -- there must be whitespace after line. If that's what you want, then it is fine.

Having text match an empty string will likely lead to conflicts if text is not your start string (eg, if you have another rule like input: text line | input text line;).

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226