5

I have some bison grammar:

input: /* empty */
       | input command
;

command:
        builtin
        | external
;

builtin:
        CD { printf("Changing to home directory...\n"); }
        | CD WORD { printf("Changing to directory %s\n", $2); }
;

I'm wondering how I get Bison to not accept (YYACCEPT?) something as a command until it reads ALL of the input. So I can have all these rules below that use recursion or whatever to build things up, which either results in a valid command or something that's not going to work.

One simple test I'm doing with the code above is just entering "cd mydir mydir". Bison parses CD and WORD and goes "hey! this is a command, put it to the top!". Then the next token it finds is just WORD, which has no rule, and then it reports an error.

I want it to read the whole line and realize CD WORD WORD is not a rule, and then report an error. I think I'm missing something obvious and would greatly appreciate any help - thanks!

Also - I've tried using input command NEWLINE or something similar, but it still pushes CD WORD to the top as a command and then parses the extra WORD separately.

chucknelson
  • 2,328
  • 3
  • 24
  • 31

4 Answers4

2

Sometimes I deal with these cases by flattening my grammars.

In your case, it might make sense to add tokens to your lexer for newline and command separators (;) so you can explicitly put them in your Bison grammar, so the parser will expect a full line of input for a command before accepting as a commmand.

sep:   NEWLINE | SEMICOLON
   ;

command:  CD  sep
   |  CD WORD sep
   ;

Or, for an arbitrary list of arguments like a real shell:

args:
    /* empty */
  | args WORD
  ;

command:
      CD args sep
   ;
codenheim
  • 20,467
  • 1
  • 59
  • 80
  • This seems to work. It's a bummer, though, that i have to specifically mention that separator expression for each command. I might change over to arbitrary arguments at some point...but not yet! I'm still curious if there are other ways to do this... – chucknelson Apr 08 '10 at 16:38
  • Correction: this works with 2 words (cd hello hello), but at that point it pops the tokens off. Then it starts again for some reason. So "cd hello1 hello2 hello3" will pop off cd, hello1, and hello2, but then it will try to match a separate rule for hello3. I'm so confused... – chucknelson Apr 08 '10 at 16:47
  • If you use the "args" rule as in the 2nd portion above it should match an arbitrary number. – codenheim Apr 08 '10 at 17:48
  • Still don't fully understand all of this, and I still get some wacky results in Bison, but this definitely helped. – chucknelson Apr 10 '10 at 15:54
1

Instead of calling actions directly, just build yourself an Abstract Syntax Tree first. Then depending on the result and your preference you either execute the part of it or nothing. If there is a parsing error during tree building you may want to use %destructor directive to tell bison how to do the cleanup.

That actually is a proper way of doing it as you get full control over the contents and logic and you let bison just take care of parsing.

Michal M
  • 122
  • 9
  • Thanks for the answer - in a current class I'm in we are doing a project where we parse a language, build an AST, and generate code. Sadly I didn't have that experience back in the class where I was using Bison and YACC. Thanks again, I'll probably think about the problem differently next time I have to do something similar. – chucknelson Mar 22 '11 at 18:33
0

Usually, things aren't done the way you describe.

With Bison/Yakk/Lex, one usually carefully designs their syntax to do exactly what they need. Because Bison/Yakk/Lex are naturally greedy with their regular expressions, this should help you.

So, how about this instead.

Since you are parsing whole lines at a time, I think we can use this fact to our advantage and revise the syntax.

input : /* empty */
      | line


command-break : command-break semi-colon
              | semi-colon

line : commands new-line

commands : commands command-break command
         | commands command-break command command-break
         | command
         | command command-break

...

Where new-line, 'semi-colonis defined in yourlexsource as something like\n,\t` . This should give you the UNIX-style syntax for commands that you are looking for. All sorts of things are possible, and it is a little bloated allowing for multiple semicolons and doesn't take in consideration white-space, but you should get the idea.

Lex and Yakk are a powerful tool, and I find them quite enjoyable - at least, when you aren't on a deadline.

rlb.usa
  • 14,942
  • 16
  • 80
  • 128
0

Couldn't you just change your rule match actions to append to a list of actions you want to perform if the whole thing works? Then after the entire input has been processed you decide if you want to do what was in that list of actions based on if you saw any parse errors.

nategoose
  • 12,054
  • 27
  • 42