3

I've written a pegjs grammar that is supposed to parse any kind of js/c-style comments. However, it's not quite working since I've only managed to capture the comment itself, and ignore everything else. How should I alter this grammar to only parse comments out of any kind of input?

Grammar:

Start
  = Comment

Character
  = .

Comment
  = MultiLineComment
  / SingleLineComment

LineTerminator
  = [\n\r\u2028\u2029]

MultiLineComment
  = "/*" (!"*/" Character)* "*/"

MultiLineCommentNoLineTerminator
  = "/*" (!("*/" / LineTerminator) Character)* "*/"

SingleLineComment
  = "//" (!LineTerminator Character)*

Input:

/**
 * Trending Content
 * Returns visible videos that have the largest view percentage increase over
 * the time period.
 */

Other text here

Error

Line 5, column 4: Expected end of input but "\n" found.
The Puma
  • 1,352
  • 2
  • 14
  • 27

1 Answers1

1

You need to refactor to specifically capture the line content before you consider the comment (either single or multiple line), as in:

lines = result:line* {
  return result
}

line = WS* line:$( !'//' CHAR )* single_comment ( EOL / EOF ) { // single-comment line
  return line.replace(/^\s+|\s+$/g,'')
}
/ WS* line:$( !'/*' CHAR )* multi_comment ( EOL / EOF ) { // mult-comment line
  return line.replace(/^\s+|\s+$/g,'')
}
/ WS* line:$CHAR+ ( EOL / EOF ) { // non-blank line
  return line.replace(/^\s+|\s+$/g,'')
}
/ WS* EOL { // blank line
  return ''
}

single_comment = WS* '//' CHAR* WS*

multi_comment = WS* '/*' ( !'*/' ( CHAR / EOL ) )* '*/' WS*

CHAR = [^\n]
WS = [ \t]
EOF = !.
EOL = '\n'

which, when run against:

no comment here

single line comment // single-comment HERE

test of multi line comment /*

  multi-comment HERE

*/

last line

returns:

[
  "no comment here",
  "",
  "single line comment",
  "",
  "test of multi line comment",
  "",
  "last line"
]
Rob Raisch
  • 17,040
  • 4
  • 48
  • 58