Terminating blocks of multiline text in Xtext/ANTLR

Question

I am starting to get a handle on Xtext, but I am still having a bit of trouble seperating semantic sections by multiple newlines.

grammar org.example.dsl.MyDsl hidden(WS)

import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate words "http://www.example.org/dsl

Document:
    sections+=Paragraph+
;

Paragraph:
    lines+=Text+
    ( -> NL)
;

Text: 
    value=WordGroup
    NL
;

WordGroup: SIMPLE_WORD+;

terminal SIMPLE_WORD: 
    ('0'..'9' | 'a'..'z' | 'A'..'Z') 
    ('0'..'9' | 'a'..'z' | 'A'..'Z' | '-' | '_' | '.')*
;
terminal NL: ('\r'? '\n');
terminal WS: (' ' | '\t');

This is ok...

    @Test
    def void happyPath() {
        val model = parseHelper.parse('''
            The quick brown fox
            Jumps over the lazy dog

        ''')

        assertThat(model, notNullValue())
        assertThat(model.eResource.errors, equalTo(#[]))
        assertThat(model.sections.size(), equalTo(1))
        assertThat(model.sections.get(0).lines.size(), equalTo(2))
        // It works!
    }

But this is not...

    @Test
    def void noTrailingNewlines() {
        val model = parseHelper.parse('''
            The quick brown fox
            Jumps over the lazy dog
        ''')

        assertThat(model, notNullValue())
        assertThat(model.eResource.errors, equalTo(#[]))
        // Fail ^^^ XtextSyntaxDiagnostic: null:2 mismatched input '<EOF>' expecting RULE_NL
        assertThat(model.sections.size(), equalTo(1))
        assertThat(model.sections.get(0).lines.size(), equalTo(2))
    }

Both should be valid parsable text, but I can't get it to accept a single NL, if that is the last char found.

I tried the obvious ( -> NL?)...

Paragraph:
    lines+=Text+
    ( -> NL?)
;

...And This actually does cause the tests to pass, Only now I have ANTLR warnings.

And as I suspected, this just kicks the problem down the road...

    @Test
    def void multipleParagraphs() {
        val model = parseHelper.parse('''
            The quick brown fox
            Jumps over the lazy dog

            But only on days that end in Y
        ''')

        assertThat(model, notNullValue())
        assertThat(model.eResource.errors, equalTo(#[]))
        assertThat(model.sections.size(), equalTo(2)) //Expected: <2> but: was <1>
    }

if a Text can contain any number of NL how shall it ever end? — Christian Dietrich, May 14 '20 at 08:55
Text can contain at most one NL at the end, terminating as soon as it encounters NL NL. This was my intent at least. I'll play around with it a bit more and see if I can figure out where my grammar does not match my intent. I think maybe individual line processing? This could also help with formatting... I need to be able to preserve the newlines in the text, but not those between paragraphs (beyond formatting concerns). — drkstr101, May 14 '20 at 19:16
ah... I think I see what you're getting at. my terminal rules in Text are no good. I will do better and post back if I get stuck again. Thanks for the pointer! — drkstr101, May 14 '20 at 19:29
Would adding an optional newline at the end of a document fix this? Something like `Document: sections+=Paragraph+ NL? ;` — RedKnite, May 16 '20 at 00:37
@RedKnite I've updated my questions to show my closest attempt in case you missed it. After making your suggested change to the *new* grammar def above, there was no change (first test pass, second fail). My changes could have affected your suggestion though. — drkstr101, May 16 '20 at 00:49
Does it still fail the second one if you remove the `( -> NL)` part of the paragraph rule while incorporating the optional newline in the document rule? — RedKnite, May 16 '20 at 02:50
Sadly, it does. I forgot to mention I tried this as well. I was going to play around with a bit using the basic idea in various ways. I'll post back if I make any progress, but so far all my attempts to deviate from what I have above lead to alternate rule warnings, or even worse results. — drkstr101, May 16 '20 at 03:34
Awesome! I can now get both cases to pass if I can live with ANTLR warnings. I would prefer not to if at all possible. Should I be looking the ANTLR warnings in the same sense as back tracking (IE. probably an indicator of an underlying problem that should be fixed)? — drkstr101, May 16 '20 at 08:55
maybe move the nl up `Document: sections+=Paragraph (NL sections+=Paragraph)*`; — Christian Dietrich, May 17 '20 at 09:30
Oh that's a great idea! I will play around with that (and a few variations) and see if I can get it to work better. — drkstr101, May 17 '20 at 22:42
@ChristianDietrich That did the trick! I've documented the results in the answer below. — drkstr101, May 19 '20 at 17:55

drkstr101 · Accepted Answer · 2020-05-19T17:51:41.400

And we have a winner!

Many thanks to @christian-dietrich and @redknite who where incredibly patient with me as they helped me work through the problem.

Document:
    sections+=Paragraph
    (NL sections+=Paragraph)*
;

Paragraph:
    lines+=Text+
;

Text: 
    value=WordGroup
    NL
;

@Test
def void singleParagraph() {
    val model = parseHelper.parse('''
        The quick brown fox
        Jumps over the lazy dog
    ''')

    assertThat(model, notNullValue())
    assertThat(model.eResource.errors, equalTo(#[]))

    val doc = model.document
    assertThat(doc, notNullValue())
    assertThat(doc.sections.size(), equalTo(1))
    assertThat(doc.sections.get(0).lines.size(), equalTo(2))
}

@Test
def void multiParagraph() {
    val model = parseHelper.parse('''
        The quick brown fox
        Jumps over the lazy dog

        But only on days that end in Y
    ''')

    assertThat(model, notNullValue())
    assertThat(model.eResource.errors, equalTo(#[]))

    val doc = model.document
    assertThat(doc, notNullValue())
    assertThat(doc.sections.size(), equalTo(2))
    assertThat(doc.sections.get(0).lines.size(), equalTo(2))
}

Both pass!

Terminating blocks of multiline text in Xtext/ANTLR

1 Answers1