0

I'd like to parse a numbered list using nom in Rust.

For example, 1. Milk 2. Bread 3. Bacon.

I could use separated_list1 with an appropriate separator parser and element parser.

fn parser(input: &str) -> IResult<&str, Vec<&str>> {
    preceded(
        tag("1. "),
        separated_list1(
            tuple((tag(" "), digit1, tag(". "))),
            take_while(is_alphabetic),
        ),
    )(input)
}

However, this does not validate the increasing index numbers.

For example, it would happily parse invalid lists like 1. Milk 3. Bread 4. Bacon or 1. Milk 8. Bread 1. Bacon.

It seems there is no built-in nom parser that can do this. So I ventured to try to build my own first parser...

My idea was to implement a parser similar to separated_list1 but which keeps track of the index and passes it to the separator as argument. It could accept a closure as argument that can then create the separator parser based on the index argument.

fn parser(input: &str) -> IResult<&str, Vec<&str>> {
    preceded(
        tag("1. "),
        separated_list1(
            |index: i32| tuple((tag(" "), tag(&index.to_string()), tag(". "))),
            take_while(is_alphabetic),
        ),
    )(input)
}

I tried to use the implementation of separated_list1 and change the separator argument to G: FnOnce(i32) -> Parser<I, O2, E>,, create an index variable let mut index = 1;, pass it to sep(index) in the loop, and increase it at the end of the loop index += 1;.

However, Rust's type system is not happy!

How can I make this work?


Here's the full code for reproduction

use nom::{
    error::{ErrorKind, ParseError},
    Err, IResult, InputLength, Parser,
};

pub fn separated_numbered_list1<I, O, O2, E, F, G>(
    mut sep: G,
    mut f: F,
) -> impl FnMut(I) -> IResult<I, Vec<O>, E>
where
    I: Clone + InputLength,
    F: Parser<I, O, E>,
    G: FnOnce(i32) -> Parser<I, O2, E>,
    E: ParseError<I>,
{
    move |mut i: I| {
        let mut res = Vec::new();
        let mut index = 1;

        // Parse the first element
        match f.parse(i.clone()) {
            Err(e) => return Err(e),
            Ok((i1, o)) => {
                res.push(o);
                i = i1;
            }
        }

        loop {
            let len = i.input_len();
            match sep(index).parse(i.clone()) {
                Err(Err::Error(_)) => return Ok((i, res)),
                Err(e) => return Err(e),
                Ok((i1, _)) => {
                    // infinite loop check: the parser must always consume
                    if i1.input_len() == len {
                        return Err(Err::Error(E::from_error_kind(i1, ErrorKind::SeparatedList)));
                    }

                    match f.parse(i1.clone()) {
                        Err(Err::Error(_)) => return Ok((i, res)),
                        Err(e) => return Err(e),
                        Ok((i2, o)) => {
                            res.push(o);
                            i = i2;
                        }
                    }
                }
            }
            index += 1;
        }
    }
}
mdcq
  • 1,593
  • 13
  • 30

1 Answers1

1

Try to manually use many1(), separated_pair(), and verify()

fn validated(input: &str) -> IResult<&str, Vec<(u32, &str)>> {
    let current_index = Cell::new(1u32);
    let number = map_res(digit1, |s: &str| s.parse::<u32>());
    let valid = verify(number, |digit| {
        let i = current_index.get();
        if digit == &i {
            current_index.set(i + 1);
            true
        } else {
            false
        }
    });
    let pair = preceded(multispace0, separated_pair(valid, tag(". "), alpha1));
    //give current_index time to be used and dropped with a temporary binding. This will not compile without the temporary binding 
    let tmp = many1(pair)(input);
    tmp
}

#[test]
fn test_success() {
    let input = "1. Milk 2. Bread 3. Bacon";
    assert_eq!(validated(input), Ok(("", vec![(1, "Milk"), (2, "Bread"), (3, "Bacon")])));
}

#[test]
fn test_fail() {
    let input = "2. Bread 3. Bacon 1. Milk";
    validated(input).unwrap_err();
}
MeetTitan
  • 3,383
  • 1
  • 13
  • 26
  • Thanks! Your solution is clever. It feels like using `Cell` for global state and a temporary binding to make it all work is fighting Rust a bit though. With your previous suggestion I was able to implement a simple `map_res`, checking if the numbers in the vec are all properly increasing, e.g. `vec.iter().enumerate().all(|(i, val)| val.1 == i + 1)`, and returning a custom error if not, e.g. `Err(nom::Err::Error(MyError::IncreasingNumberedList))`. Not as good as yours though which immediately errors upon an invalid index. – mdcq Feb 07 '23 at 22:04
  • 1
    We're more fighting the signature of `verify()` here, since it accepts a `Fn` and not `FnMut` meaning we can't mutable close over any out of scope values. Also, the compiler *may* elide the cell completely in release mode. I admit to the inelegance of reading this solution, though. – MeetTitan Feb 07 '23 at 23:01