0

I am trying to define a non terminal symbol in a LALR(1) grammar (with CUP parser). It is requested that

the <code> token must appear exactly two times, 
while <hour> token must appear at least one time.

In the end I came up with this definition:

section     ::= hour_l CODE SC hour_l CODE SC hour_l ;
hour_l      ::= /* epsilon */ 
            | hour_l HOUR SC ;

Where SC is a separator (semicolon) between tokens and hour_l is the non terminal symbol for hour's list. This solution has a problem: HOUR can be not present, because epsilon can be reduced to hour_l. There is a clever solution than specifying all possibilities or using the semantic capabilities of CUP (ie. putting a counter of how many times HOUR is present in section)? I'd prefer not to use semantics in order to achieve this; in fact, it seems to me it is syntax related.

rds
  • 26,253
  • 19
  • 107
  • 134
gc5
  • 9,468
  • 24
  • 90
  • 151

1 Answers1

0

My solution, suggested OOB by a friend, is to use a Finite State Machine:

section     ::= c ;
a           ::= CODE SC ;
b           ::= a CODE SC ;
c           ::= c HOUR SC | b HOUR SC | e CODE SC ;
d           ::= HOUR SC | d HOUR SC ;
e           ::= e HOUR SC | a HOUR SC | d CODE SC ;

c is the final state accepted by this machine. I drew a Deterministic Finite Automata

DFA

and transformed it in a Right Regular Grammar.

Hope it helps. Thanks

gc5
  • 9,468
  • 24
  • 90
  • 151