0

I'm trying to wrap my head around the nom package. The simple problem I'm trying to solve is to write a parser that can count the comment lines in a file. I've two types of comments to parse for:

  1. Single line comments using //
  2. Multi-line comments using /* ... */

Here is the code I have so far:

use nom::{
  Err, IResult, Parser,
  branch::alt,
  bytes::complete::{is_not, tag, take_until},
  character::complete::{char, line_ending},
  combinator::{value, eof, map, not, success, all_consuming},
  error::{ErrorKind, ParseError},
  multi::many0,
  sequence::{pair, tuple, preceded, delimited, terminated},
};
// matches on single line comment
fn single_line_comment(s: &str) -> IResult<&str,&str> {
  preceded( tag("//"),is_not("\n\r"))(s)
}

// returns `1` for single line comment match
pub fn count_single_line_comment(i: &str) -> IResult<&str, usize> {
  value(1,
    single_line_comment
  )(i)
}

// matches on a multi-line comment
fn multi_line_comments(i: &str) -> IResult<&str, &str> {
  delimited(tag("/*"),take_until("*/"),tag("*/"))(i)
}

// returns the line count of a multi-line comment
fn count_multi_line_comments(i: &str) -> IResult<&str, usize> {
  map(multi_line_comments, |s| s.lines().count()) (i) 
}

// helper parser that matches on either of the two comment types
fn _count_comment_lines(s: &str) -> IResult<&str,usize> {
  alt((count_single_line_comment, count_multi_line_comments))(s)
}

// function I would like to write but I can't figure this part out
pub fn count_comment_lines(s: &str) -> IResult<&str, Vec<usize>> {
    many0(
      alt((_count_comment_lines, 
           preceded(
            not(_count_comment_lines),
           _count_comment_lines
          )))
    )(s)
}

My logic for the last parser is we're going to match many times on either a comment block, or something that leads up to a comment block or the rest of the file. But it doesn't work. Trying this on my sample strings doesn't yield the values I want.

static SAMPLE1: &str = "//Hello there!\n//What's going on?";
static SAMPLE2: &str = "/* What's the deal with airline food?\nIt keeps getting worse and worse\nI can't take it anymore!*/";
static SAMPLE3: &str = " //Global Variable\nlet x = 5;\n/*TODO:\n\t// Add the number of cats as a variable\n\t//Shouldn't take too long\n*/";
static SAMPLE4: &str = "//First\n//Second//NotThird\n//Third";

In my head, I want to use take_until but that doesn't take a parser, it takes a tag. I don't see anything like take_until for parsers so I'm thinking I have to rethink the combinator approach generally. What do you suggest?

jks612
  • 1,224
  • 1
  • 11
  • 20

1 Answers1

0

I figured out an answer:

use std::io;
use nom::{
  IResult,
  branch::alt,
  bytes::complete::{is_not, tag, take_until},
  character::complete::anychar,
  combinator::{value, eof, peek, opt},
  multi::{many0, many_till},
  sequence::{preceded, delimited, terminated},
};


fn get_input() -> String {
  let mut input = String::new();
  io::stdin()
    .read_line(&mut input)
    .expect("Failed to read line");
  input.trim().to_string()
}

// match single comment 
fn parse_single_line_comment(s: &str) -> IResult<&str,usize> {
  value(1,
    preceded( tag("//"),is_not("\n"))
  )(s)
}

// mat
fn parse_multi_line_comments(s: &str) -> IResult<&str, usize> {
  let (a,b) = 
  delimited(
    tag("/*"),
    take_until("*/"),
    tag("*/")
  )(s)?;

  Ok((a,b.lines().count()))
}

fn parse_comment(s: &str) -> IResult<&str, usize> {
  alt((parse_single_line_comment, parse_multi_line_comments))(s)
}

fn parse_end_of_file(s:&str) -> IResult<&str,usize> {
  value(0,eof)(s)
}

fn skip_not_comment(s: &str) -> IResult<&str, usize> {
  let (s_rest, _) = many_till(anychar, 
              peek(alt((parse_comment, parse_end_of_file)))
  )(s)?;
  Ok((s_rest,0))
}



pub fn extract_comments(s: &str) -> IResult<&str,usize> {
  let (tail,_) = opt(skip_not_comment)(s)?;

  let (rest, nums) = 
    many0( 
      terminated(
        parse_comment,
        opt(skip_not_comment)
      )
    )(tail)?;

    Ok((rest, nums.iter().sum()))
}

I still think it can be improved but the following tests are passed:

static SAMPLES: [&str;8] = 
  [
    "No comments here",
    "//Hello there!\n//General Kenobi",
    "/* What's the deal with airline food?\nIt keeps getting worse and worse\nI can't take it anymore!*/",
    " //Global Variable\nlet x = 5;\n/*TODO:\n\t// Add the number of cats as a variable\n\t//Shouldn't take too long\n*/\nlet c = 500;",
    "//First\n//Second//NotThird\n//Third",
    "x = 3*4 /* not 3*5 */",
    "/* foo */ /* unterminated comment",
    ""
  ];

#[test]
pub fn count_all_comments() {
  let tests: Vec<usize> = SAMPLES.iter().map(|x| extract_comments(&x).unwrap().1).collect();
  assert_eq!(tests, vec![0,2,3,4,3,1,1,0])
}

The trick here was to consume input until you find a comment (using peek) and then begin to search for comment, non-comment, until the end.

jks612
  • 1,224
  • 1
  • 11
  • 20