Parsing a variably space delimited list with nom

Question

How can I consume a list of tokens that may or may not be separated by a space?

I'm trying to parse Chinese romanization (pinyin) in the cedict format with nom (6.1.2). For example "ni3 hao3 ma5" which is, due to human error in transcription, sometimes written as "ni3hao3ma5" or "ni3hao3 ma5" (note the variable spacing).

I have written a parser that will handle individual syllables e.g. ["ni3", "hao3", "ma5"], and I'm trying to use a nom::multi::separated_list0 to parse it like so:

nom::multi::separated_list0(
    nom::character::complete::space0,
    syllable,
)(i)?;

However, I get a Err(Error(Error { input: "", code: SeparatedList })) after all the tokens have been consumed.

score 1 · Answer 1 · answered Apr 23 '21 at 20:31

The problem with using

nom::multi::separated_list0(
    nom::character::complete::space0,
    syllable,
)(i)?;

Is that the space0 delimiter matches empty string, so it will reach the end of the input string and the separated_list0 will continue to try to consume the empty string, hence the Err(Error(Error { input: "", code: SeparatedList })).

The solution in my case was to use nom::multi::many1 and handling the optional spaces in the inner parser instead of nom::multi::separated_list0 like so:

fn syllables(i: &str) -> IResult<&str, Vec<Syllable>> {
    // many  instead of separated_list0
    multi::many1(syllable)(i)
}

fn syllable(i: &str) -> IResult<&str, Syllable> {
    let (rest, (_, pronunciation, tone)) = sequence::tuple((

        // and handle the optional space
        //              here 
        character::complete::space0,
        character::complete::alpha1,
        character::complete::digit0,
    ))(i)?;

    Ok((rest, Syllable::new(pronunciation, tone)))
}

Parsing a variably space delimited list with nom

1 Answers1