I'd like to parse a numbered list using nom
in Rust.
For example, 1. Milk 2. Bread 3. Bacon
.
I could use separated_list1
with an appropriate separator parser and element parser.
fn parser(input: &str) -> IResult<&str, Vec<&str>> {
preceded(
tag("1. "),
separated_list1(
tuple((tag(" "), digit1, tag(". "))),
take_while(is_alphabetic),
),
)(input)
}
However, this does not validate the increasing index numbers.
For example, it would happily parse invalid lists like 1. Milk 3. Bread 4. Bacon
or 1. Milk 8. Bread 1. Bacon
.
It seems there is no built-in nom parser that can do this. So I ventured to try to build my own first parser...
My idea was to implement a parser similar to separated_list1
but which keeps track of the index and passes it to the separator as argument. It could accept a closure as argument that can then create the separator parser based on the index argument.
fn parser(input: &str) -> IResult<&str, Vec<&str>> {
preceded(
tag("1. "),
separated_list1(
|index: i32| tuple((tag(" "), tag(&index.to_string()), tag(". "))),
take_while(is_alphabetic),
),
)(input)
}
I tried to use the implementation of separated_list1
and change the separator argument to G: FnOnce(i32) -> Parser<I, O2, E>,
, create an index variable let mut index = 1;
, pass it to sep(index)
in the loop, and increase it at the end of the loop index += 1;
.
However, Rust's type system is not happy!
How can I make this work?
Here's the full code for reproduction
use nom::{
error::{ErrorKind, ParseError},
Err, IResult, InputLength, Parser,
};
pub fn separated_numbered_list1<I, O, O2, E, F, G>(
mut sep: G,
mut f: F,
) -> impl FnMut(I) -> IResult<I, Vec<O>, E>
where
I: Clone + InputLength,
F: Parser<I, O, E>,
G: FnOnce(i32) -> Parser<I, O2, E>,
E: ParseError<I>,
{
move |mut i: I| {
let mut res = Vec::new();
let mut index = 1;
// Parse the first element
match f.parse(i.clone()) {
Err(e) => return Err(e),
Ok((i1, o)) => {
res.push(o);
i = i1;
}
}
loop {
let len = i.input_len();
match sep(index).parse(i.clone()) {
Err(Err::Error(_)) => return Ok((i, res)),
Err(e) => return Err(e),
Ok((i1, _)) => {
// infinite loop check: the parser must always consume
if i1.input_len() == len {
return Err(Err::Error(E::from_error_kind(i1, ErrorKind::SeparatedList)));
}
match f.parse(i1.clone()) {
Err(Err::Error(_)) => return Ok((i, res)),
Err(e) => return Err(e),
Ok((i2, o)) => {
res.push(o);
i = i2;
}
}
}
}
index += 1;
}
}
}