1

Suppose I define a happy grammar

%name pf f
%tokentype { AB }

%error { parseError }

%token
    a { A }
    b { B }
%%

f : 
  a g a {}
  | b {}
g :
   b b {}

{
data AB = A | B deriving (Eq,Ord,Show)
parseError _ = error " bad "
}

If I compile this with

happy --glr

I am interested in generally in grammars with non-trivial ambiguities; however, this example demonstrates the bit that is confusing me.

I get a Haskell parser. I get a success only when the token stream is a b b a or b

However, I am much more interested in failure. I would like to fail very fast, and I seem to need to more tokens than I would think would be required.

For example, if I feed the token stream a,a,a ... it takes to the third a to fail. If I feed b b b, it takes to the third b to fail. Why the extra lookahead? When matching f, once I see two 'a's, there is nothing in the grammar that can match.

sclv
  • 38,665
  • 7
  • 99
  • 204
Jonathan Gallagher
  • 2,115
  • 2
  • 17
  • 31
  • In fact, if I modify this tokentype to be ABCD = A | B| C | D, and modify tokens to c { C }; d { D }, and add a parser h : c {}... then the generated parser requires to 'c's and two 'd's to fail. The token 'd' is completely unused, and happy recognizes that 'c' is unused by f. So I am even more confused. – Jonathan Gallagher Jun 24 '14 at 01:04
  • One further clarification: The problem is with parsing streams. I submit "a" "a" and then close the stream, or "c" and close the stream, I do indeed get failure. The only tricky thing is submitting the stream lazily. – Jonathan Gallagher Jun 24 '14 at 01:46

0 Answers0