4

I'm trying to learn nom and have a problem where take_while does not accept is_digit or any other is_xxxx.

I have rows that I want to parse that looks like this

#123 = ABCDEF (...);

where I want to get the '123' part (and eventually the ABCDEF and the (...) parts as well. But one thing at the time I guess).

My parser currently looks like this

use nom::{
  bytes::complete::take_while,
  character::is_digit,
  error::ParseError,
  IResult
};

// Get row id
fn id<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, &'a str, E> {
    take_while(is_digit)(i)
}

The is_digit definition looks like this

pub fn is_digit(chr: u8) -> bool

And since the id parser takes a &str it will complain about the mismatch in types. But is it possible somehow to use the is_digit anyway? Can I do a type conversion somewhere without having to allocate anything. I really want this to be as efficient as possible.

It feels like the provided is_xxxx functions should be used in these kinds of situations, but I might be wrong about it.

Thanks!

mottosson
  • 3,283
  • 4
  • 35
  • 73

3 Answers3

3

I know it doesn't directly answer your question because it doesn't directly use take_while but you could use the digit1 parser in character::complete::digit1.

It takes an &str and consumes 1 or more digits in [0..9] and returns an &str

Steve
  • 2,205
  • 1
  • 21
  • 28
3

You can easily adapt is_digit to a char. First, all digits are valid ASCII, so we should check first if the character is ASCII. If it's ASCII, we can safely convert to u8.

// pub fn is_digit(chr: u8) -> bool;

pub fn is_char_digit(chr: char) -> bool {
    return chr.is_ascii() && is_digit(chr as u8)
}

You could also use the trait method is_dec_digit, which is just a wrapper for char.is_digit.

Alex Huszagh
  • 13,272
  • 3
  • 39
  • 67
  • 1
    That looks like a nifty technique. Does it have any perfomance implications? – mottosson Sep 07 '19 at 20:31
  • 1
    Not particularly, as long as you inline it. If you want it to be faster, you can directly compare using `char.is_digit`, which should be slightly faster (or `is_dec_digit`, since it means fewer comparisons. In fact, this pretty much how nom does it (using is_dec_digit): https://docs.rs/nom/4.0.0/src/nom/nom.rs.html#216 – Alex Huszagh Sep 07 '19 at 20:36
0

Up-to-date solution using nom's AsChar trait:

take_while1(AsChar::is_dec_digit)(input)
TeNNoX
  • 1,899
  • 3
  • 16
  • 27