0

I'm trying to wrap my head around how to handle C-style multiline comments (/* */) with a recursive descent parser. Because these comments can appear anywhere, how do you account for them? For example, suppose you're parsing a sentence into word tokens, what do we do if there's a comment inside a word?

Ex.

This is a sentence = word word word word

vs

This is a sen/*sible*/tence = ???

Thanks!

  • Did you write a lexer/tokenizer first? You could just ignore anything between `/*` and `*/` when breaking your program text into tokens. – eigenchris Mar 06 '15 at 03:29

1 Answers1

1

In C, like pretty well every other programming language, a comment is effectively whitespace; a comment cannot occur within a token.

So comments cannot interrupt the parsing of a token, and thus only need to be recognized and ignored.

rici
  • 234,347
  • 28
  • 237
  • 341
  • So if I want to still keep track of comments and where they are in the text, should I do two passes through the text? One ignoring comments, and the other only looking for comments? – John Wonderick Mar 06 '15 at 05:02
  • @JohnWonderick You can keep a separate list of where comments are without a second pass. But comments really are irrelevant to parsing. If you are trying to build a pretty-printer or some such, you might create a linked list/vector of tokens as you tokenize, but do the parse itself only with meaningful tokens. – rici Mar 06 '15 at 05:34