1

To parse a large file line-by-line I was trying to use itertools::process_results in the following manner, having it fail immediately if the file cannot be read:

fn parse_file() -> Result<(), Box<dyn std::error::Error>> {
    let reader = BufReader::new(File::open("file")?);
    let line_iter = process_results(reader.lines(), filter_lines);
    unimplemented!()
}

fn filter_lines<I: Iterator<Item=String>>(lines: I) -> impl Iterator<Item=String> {
    lines.filter(|line| {
        unimplemented!()
    })
}

However, the compiler is not happy with the signature of filter_lines and says something like

error[E0631]: type mismatch in function arguments

expected signature of for<'r> fn(itertools::process_results_impl::ProcessResults<'r, std::io::Lines<std::io::BufReader<std::fs::File>>, std::io::Error>) -> _

Since ProcessResults implements Iterator, I don't understand the problem here. How would the type of filter_lines need to be changed for it to work with process_results?

wonce
  • 1,893
  • 12
  • 18
  • 1
    For ease of experimentation: [playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=314c407fbc6f261585900bf6957967c6) – user4815162342 Sep 17 '20 at 14:37

1 Answers1

1

Since ProcessResults implements Iterator, I don't understand the problem here.

The problem is not in the trait bounds, but in the lifetime of the value returned by filter_lines. process_results hands you an iterator that refers to data local to process_results invocation. You can use that iterator, such as by iterating over it or passing it to functions, but you're not allowed to return it out of the callback, nor to return an iterator that refers to it, as filter_lines() does. This is because the value returned from the callback is returned by process_results itself, and that value must not refer to data local to process_results.

For example, this implementation of filter_lines(), which goes through the iterator and returns a value based on the values found inside, compiles:

fn filter_lines<I: Iterator<Item = String>>(lines: I) -> Option<usize> {
    lines.map(|line| line.len()).max()
}

Likewise, leaving your implementation of filter_lines, but not returning the iterator from the callback, also compiles:

// with filter_lines as originally defined
let line_max = process_results(reader.lines(), |lines| filter_lines(lines).max());

If you want to build a new iterator that stops when an existing iterator encounters an error, you won't be able to use process_results(). Instead, you will need to use take_while or scan with some outside state, as described here. Applied to your code, scan would be used like this:

fn parse_file() -> Result<(), Box<dyn std::error::Error>> {
    let reader = BufReader::new(File::open("file")?);
    let mut err = Ok(());
    let line_iter = reader.lines().scan(&mut err, until_err);
    let new_iter = filter_lines(line_iter)
    // some processing, e.g.:
    new_iter.for_each(|line| todo!());
    err?;  // check whether iteration was ended by error Result
    unimplemented!()
}

// filter_lines as originally written...

// helper function for `Iterator::scan()`
fn until_err<T, E>(err: &mut &mut Result<(), E>, item: Result<T, E>) -> Option<T> {
    match item {
        Ok(item) => Some(item),
        Err(e) => {
            **err = Err(e);
            None
        }
    }
}
user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • Strange, I still get a type error when not returning the iterator: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=77c332718e8b20cea4e79bb18ea3b71d – wonce Sep 17 '20 at 15:34
  • @wonce Interesting, it works only with a closure in-between, as in my answer. – user4815162342 Sep 17 '20 at 15:36
  • So the lifetime is actually a deeper, second problem. I assume the closure resolves the type error because it is inferred to have the correct type, which is mysteriously compatible yet incompatible with `filter_lines`. – wonce Sep 17 '20 at 15:40