0

I'm creating a parser for Pascal and I'm stuck at conditional statements.

Suppose I have this code snippet:

if ((10 mod 3) = 1) then ...

This is valid pascal if statement. However when I try to come up with LL(1) grammar for the ((10 mod 3) = 1) expression I crash and burn on the parentheses. Problem is, that the condition above can be rewritten as if (10 mod 3) = 1 then ... or thanks to operator precedence as if 10 mod 3 = 1 then ...

I have typical LL(1) grammar for arithmetic expressions:

E  -> TE'
E' -> ADD_SUB TE' | epsion
T  -> FT'
T' -> MUL_DIV FT' | epsilon
F  -> 'number' | '(' E ')'

ADD_SUB -> '+' | '-'
MUL_DIV -> '*' | 'div' | 'mod'

However I can't come up with LL(1) grammar for the whole condition. I thought of something like:

CE  -> CT CE'
CE' -> 'or' CT CE' | epsion
CT  -> CF CT'
CT' -> 'and' CF CT' | epsilon
CF  -> E REL-OP E | '(' E REL-OP E ')' | 'not' E REL-OP E

REL-OP -> '=' | '<' | '<=' | '>' | '>=' | '<>

Where E is E from the arithmetic expression grammar above.

This isn't LL(1), because the CF -> E REL_OP E and CF -> '(' E REL-OP E ')' rules contain first-first collision of '('.

Any ideas how to fix the first-first collision?

isklenar
  • 974
  • 2
  • 14
  • 34

1 Answers1

1

If I recall correctly, a Pascal expression can include comparison operators and boolean operators, since it has a boolean type. So boolean expressions can appear anywhere an expression can appear, and not just in if statements.

So you need to extend expression (or E, as you write it) in such a way that (10 mod 3) = 1 is an expression (and consequently ((10 mod 3) = 1) is an expression), and then an if statement starts with "if" expression "then".

If you really want to create a separate syntactic category for conditional expression (CE), then you have to go all the way down to the bottom of the precedence hierarchy, so that you would end up with a list which starts with something like

CE  -> CT CE'
CE' -> "or" CT CE' | epsilon  

and ends with

CF  -> 'number' | '(' CE ')'

The last of those productions would be simple duplicates of the existing expression grammar, with a consistent change to non-terminal names. But that's a lot of unnecessary duplication.

rici
  • 234,347
  • 28
  • 237
  • 341