Parse multiline comment with nom

Question

I'm trying to write a nom parser that recognizes multiline comments...

/*
yo!
*/

...and consumes/discards (same thing, right?) the result:

use nom::{
  bytes::complete::{tag, take_until},
  error::{ErrorKind, ParseError},
  sequence::preceded,
  IResult,
};

fn multiline_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, &'a str, E> {
    preceded(tag("/*"), take_until("*/"))(i)
}

This almost works. I know the take_until stops just before the */, but I don't know what to do to make it include it.

#[test]
fn should_consume_multiline_comments() {
    assert_eq!(
        multiline_comment::<(&str, ErrorKind)>("/*abc\n\ndef*/"),
        Ok(("", "/*abc\n\ndef*/"))
    );
}

gives the result

thread 'should_consume_multiline_comments' panicked at 'assertion failed: `(left == right)`
left: `Ok(("*/", "abc\n\ndef"))`,
right: `Ok(("", "/*abc\n\ndef*/"))`'

So my question is, how do I get the full comment, including the ending */

Thanks!

score 3 · Accepted Answer · answered Oct 27 '19 at 23:41

I am assuming that you don't really want the returned string to have both preceding /* and closing */ intact - since preceded always discards the match from the first parser, you are never going to get that using preceded. I assume your main concern is ensuring that the closing */ is actually consumed.

For this you can use delimited rather than preceded:

fn multiline_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, &'a str, E> {
    delimited(tag("/*"), is_not("*/"), tag("*/"))(i)
}

This passes this test:

assert_eq!(
    multiline_comment1::<(&str, ErrorKind)>("/*abc\n\ndef*/"),
    Ok(("", "abc\n\ndef"))
);

so you can be sure that the closing */ has been consumed.

You are correct that I didn't really want the open and closing /* ... */. I just wanted the parser to consume the comment and discard it for all I care. Thank you! — mottosson, Oct 28 '19 at 07:28

score 0 · Answer 2 · answered Jun 22 '23 at 14:59

Just to add on to the answer from harmic

you would want to do

fn multiline_comment<'a, E: ParseError<&'a str>>(s: &'a str) -> IResult<&'a str, &'a str, E> {
    delimited(tag("/*"), take_until("*/"), tag("*/"))(s)
}

note: we are using take_until() instead of is_not()

is_not will stop when hitting any character in the pattern, meaning if you have * or / by themselves anywhere inside the comment it will fail to parse.

The original question was using take_until but just wanted to make it more clear for anyone who comes across this later.

Parse multiline comment with nom

2 Answers2