2

It seems a bit like a trivial question, but I am stuck on parsing the end of file EOF using my own island grammar. I am using the new VScode extension btw.

I've mostly been using the examples from the basic recipes and have a simple grammar with the following layout rules:

layout Whitespace = [\t-\n\r\ ]*;
lexical IntegerLiteral = [0-9]+ !>> [0-9];
lexical Comment = "%%" ![\n]* $;

Using this, and some rules it parses some simple files, but will give a parse error anytime a file ends in a newline. (newlines in between lines are no problem).

Am is missing something obvious?

Thanks!

Jay05
  • 173
  • 2
  • 9

2 Answers2

3

It sounds a bit like your grammar is missing a start nonterminal. All grammar rules get whitespace in between their constituent symbols but not at the start or the end.

A start nonterminal is the exception:

start syntax Islands = Island+;

Islands parseIslands(loc input)
    = parse(#start[Islands], input).top;

Passing the start nonterminal to parse will allow the file to start and end with whitespace, and using the .top field you can ignore that whitespace from the parse tree again by projecting out the middle Islands tree.

Jurgen Vinju
  • 6,393
  • 1
  • 15
  • 26
  • 1
    This did the trick :) My parse function looked like this: "parse(#Prog, location);" So adding '#start[Prog],location).top' solved it! Is this somewhere clearly explained in the docs that you know off? (cause I don't fully understand it yet :) ) Thanks for the help!! – Jay05 Nov 06 '22 at 18:24
  • This answer has a little more detail on how and why it works: https://stackoverflow.com/questions/32205620/layout-in-rascal – Jurgen Vinju Nov 08 '22 at 09:02
  • that information was copied from here in the manual: https://www.rascal-mpl.org/docs/Rascal/Declarations/SyntaxDefinition/#:~:text=The%20start%20modifier%20identifies – Jurgen Vinju Nov 08 '22 at 09:04
2

Island grammars tend to be a complex beast, so without sharing the full grammar and input string, it might be a bit hard to answer this question. But I'll share some generic feedback.

he layout production might be ambiguous, if any other part of your language has optional parts. Rascal's parsing is non-greedy. So if you have:

lexical A = "a";
lexical B = "b";
lexical C = "c";
syntax A = A? B? C;

After fusing in the layouts, this becomes:

A` = A? Whitespace? B? Whitespace? C;

Now since whitespace is not eating all characters, the grammar is ambigous, as the parser can "bind" a whitespace between the A and B, or between the B and C. So in most cases, you want to make sure it's a greedy match by adding a follow restriction:

layout Whitespace = [\t-\n \r \ ]* !>> [\t-\n \r \ ];

Also, I fixed a bug, the layout definition didn't include a space as valid whitespace. Rascal allows for spaces in the character class (for readability), so in case we need to add a space, you have to say \ .

For the rest, it looks okay, but like I started with, island grammars are a bit harder to debug without both the full syntax, and what you want to have as water and what as island.

Davy Landman
  • 15,109
  • 6
  • 49
  • 73
  • 1
    Thanks for the comprehensive answer, I absolutely have to study this grammar theory more. It did not solve yet solve my problem. To give a full example, I just copied the parse and syntax code from the 'func' recipe: https://www.rascal-mpl.org/docs/Recipes/Languages/Func/ConcreteSyntax/ Copying the example fact(n) function to a file parses successfully without a newline at EOF, and unsuccessfully if there is a newline at EOF. For now, I'll have another look at grammar theory :) – Jay05 Nov 05 '22 at 16:14