7

I have a .g4 grammar for / a lexer/parser, where the lexer is skipping line continuation tokens - not skipping them breaks the parser and isn't an option. Here's the lexer rule in question:

LINE_CONTINUATION : ' ' '_' '\r'? '\n' -> skip;

The problem this is causing, is that whenever a continued line starts at column 1, the parser blows up:

Sub Test()
Debug.Print "Some text " & _
vbNewLine & "Some more text"    
End Sub

I thought "Hey I know! I'll just pre-process the string I'm feeding ANTLR to insert an extra whitespace before the underscore, and change the grammar to accept it!"

So I changed the rule like this:

LINE_CONTINUATION : WS? WS '_' NEWLINE -> skip;
NEWLINE : WS? ('\r'? '\n') WS?; 
WS : [ \t]+;

...and the test code above gave me this parser error:

extraneous input 'vbNewLine' expecting WS

For now my only solution is to tell my users to properly indent their code. Is there any way I can fix that grammar rule?

(Full VBA.g4 grammar file on GitHub)

Mathieu Guindon
  • 69,817
  • 8
  • 107
  • 235

1 Answers1

4

You basically want line continuation to be treated like whitespace.

OK, then add the lexical definition of line continuation to the WS token. Then WS will pick up the line continuation, and you don't need the LINECONTINUATION anywhere.

//LINE_CONTINUATION : ' ' '_' '\r'? '\n' -> skip;
NEWLINE : WS? ('\r'? '\n') WS?; 
WS : ([ \t]+)|(' ' '_' '\r'? '\n');
Mathieu Guindon
  • 69,817
  • 8
  • 107
  • 235
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Spoke too fast. It worked.... *for the specific case in the OP* - so I tried changing the `WS` rule to `WS : [ \t]+ ('_' '\r'? '\n')?;`, and now it works and supports weird things like `Option Base 1` being split into `Option _\r\nBase _\r\n1`, which is awesome - but it breaks whenever a continued line has any indentation and I don't understand why, since the definition as I understand it should *also* match one or more space/tab... got a clue? – Mathieu Guindon Jan 07 '16 at 04:34
  • 1
    I think I would have defined things differently: HWS = [ \t\]+; ENDLINE= \r? \n; NEWLINE= HWS? ENDLINE; WS = HWS (ENDLINE HWS?)? ; This last bit hands your "continued line has indentation". The rest is just factoring to make it easier to understand. (HWS == "horizontal white space"). – Ira Baxter Jan 07 '16 at 05:10