0

I want to parse the text of a file that contains newlines. The file could be in Windows or Unix, but for now it is a Windows file with this contents:

(**************
***************)

The above file contents has been read in with slurp and will contain a newline. Here is the grammar that I am trying to use:

S = start-comment stars <inside-comment>
start-comment = '('
stars = '*' +
<inside-comment> = '\n' +

This grammar is also slurped in from a file, which I believe makes things a little easier:

"The only escape characters needed are the ordinary escape characters for strings and regular expressions (additionally, instaparse also supports \' inside single-quoted strings)."

The newline does not seem to be being parsed:

Parse error at line 1, column 16:
(**************
               ^
Expected one of:
"\n"
"*"

What do I need to set <inside-comment> to so that the error comes on the first star of the second line, which will indicate that the grammar has recognized the newline?

Chris Murphy
  • 6,411
  • 1
  • 24
  • 42

2 Answers2

4

Newlines in Windows show up as \r\n and in Unix as \n. So you need something like this:

#'\r?\n'

Double the blackslashes if your grammar is inside a string:

"some-rule = #'\\r?\\n'"
Rory O'Kane
  • 29,210
  • 11
  • 96
  • 131
puzzler
  • 316
  • 1
  • 7
0

This parses to the end:

S = start-comment stars <inside-comment-1> stars end-comment
start-comment = '('
end-comment = ')'
stars = '*' +
<inside-comment-1> = '\n' | '\r\n'
<inside-comment-2> = '\r?\n'

Note that <inside-comment-2> does not work. While <inside-comment-1> works, there might be a more elegant way of getting past a newline??

Chris Murphy
  • 6,411
  • 1
  • 24
  • 42