0

I'm trying to write grammar to some kind of assembler language but not using mnemonics.

I have some kind of registers, lets say:

a, b, c, d

And one special register, which keeps address in memory:

&e

Now I want to allow to assign values to them:

a = b
d = a
c = &e

a is also a special register (accumulator), so it can has some operations made only on it like:

a = a xor d

all of them has a on the left side and one of the all registers on the right side. I

My grammar:

grammar somename;
options {
    language = CSharp;
}
program: line* EOF;

line: statement (NEWLINE+ | EOF);

statement: aOperation | registerAssignment;

expression:
    or #orAssignment
    | xor #xorAssignment;


xor:
    XOR reg8;

reg: hl_read | REGISTER8;

aOperation: REG_A '=' REG_A expression;

registerAssignment: reg '=' reg;

REGISTER:
    REG_A
    | 'b'
    | 'c'
    | 'd';

e_read: E_READ;

REG_A: 'a';
OR: 'or';
XOR: 'xor';
E_READ: '&e';
WHITESPACE: (' ' | '\t')+ -> skip;
NEWLINE: ('\r'? '\n' | '\r');

Now I've got a problem, that parser always catch a line a = a xor b as a = b and next round of parser get b register and there is nothing on the right side and throws error An unhandled exception of type 'System.IndexOutOfRangeException' occurred in Program.dll: 'Index was outside the bounds of the array.' How can I fix this?

Sousuke
  • 1,203
  • 1
  • 15
  • 25
  • 1
    Your lexer will never produce a `REG_A` token because the `REGISTER` rule takes precedence. So `aOperation` can't match because the token you get is a `REGISTER`, but `aOperation` needs a `REG_A`. – sepp2k Apr 27 '23 at 16:53

1 Answers1

1

As mentioned in the comments by sepp2k: the lexer will never produce a REG_A token because the input 'a' would already be consumed by the REGISTER rule.

A solution would be to remove the REGISTER lexer rule and create a register parser rule:

register
 : REG_A
 | REG_B
 | REG_C
 | REG_D
 ;

REG_A: 'a';
REG_B: 'b';
REG_C: 'c';
REG_D: 'd';
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288