0

Its not like I am not able to return any rust iterators from a python module function using pyo3. The problem is when lifetime doesn't live long enough!

Allow me to explain.

First attempt:

#[pyclass]
struct ItemIterator {
    iter: Box<dyn Iterator<Item = u64> + Send>,
}

#[pymethods]
impl ItemIterator {
    fn __iter__(slf: PyRef<'_, Self>) -> PyRef<'_, Self> {
        slf
    }
    fn __next__(mut slf: PyRefMut<'_, Self>) -> Option<u64> {
        slf.iter.next()
    }
}

#[pyfunction]
fn get_numbers() -> ItemIterator {
    let i = vec![1u64, 2, 3, 4, 5].into_iter();
    ItemIterator { iter: Box::new(i) }
}

In the contrived example above I have written a python iterator wrapper for our rust iterator as per pyo3 guide and it works seemlessly.

Second attempt: The problem is when lifetimes are involved.

Say now I have a Warehouse struct that I would want make available as python class alongside pertaining associated functions.

struct Warehouse {
    items: Vec<u64>,
}

impl Warehouse {
    fn new() -> Warehouse {
        Warehouse {
            items: vec![1u64, 2, 3, 4, 5],
        }
    }

    fn get_items(&self) -> Box<dyn Iterator<Item = u64> + '_> {
        Box::new(self.items.iter().map(|f| *f))
    }
}

Implementing them as python class and methods

#[pyclass]
struct ItemIterator {
    iter: Box<dyn Iterator<Item = u64> + Send>,
}

#[pymethods]
impl ItemIterator {
    fn __iter__(slf: PyRef<'_, Self>) -> PyRef<'_, Self> {
        slf
    }
    fn __next__(mut slf: PyRefMut<'_, Self>) -> Option<u64> {
        slf.iter.next()
    }
}

#[pyclass]
struct Warehouse {
    items: Vec<u64>,
}

#[pymethods]
impl Warehouse {
    #[new]
    fn new() -> Warehouse {
        Warehouse {
            items: vec![1u64, 2, 3, 4, 5],
        }
    }

    fn get_items(&self) -> ItemIterator {
        ItemIterator {
            iter: Box::new(self.items.iter().map(|f| *f)),
        }
    }
}

This throws compiler error in getItems function saying:

error: lifetime may not live long enough
  --> src/lib.rs:54:19
   |
52 |     fn get_items(&self) -> ItemIterator {
   |                  - let's call the lifetime of this reference `'1`
53 |         ItemIterator {
54 |             iter: Box::new(self.items.iter().map(|f| *f)),
   |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cast requires that `'1` must outlive `'static`

error: could not compile `pyo3-example` due to previous error

I am not really sure how to fix this. Can someone explain what's really going on here. How does this compare to my first attempt implementing iterators and how to fix this?

iamsmkr
  • 800
  • 2
  • 10
  • 29
  • I don't know enough about pyo3 to suggest a solution, but I think the root of the issue is that the `Box + Send>` has no explicit lifetime, but the `Iterator` you're trying to put in the box borrows from `self` in `get_items`. If this was allowed, there would be nothing stopping you from dropping the `Warehouse` and causing a use-after-free. The fix for this would be to change it to `ItemIterator<'a>` and `Box + Send + 'a>`, but pyo3 doesn't allow you to make types with lifetimes into a `pyclass`... – Joe Clay Feb 15 '23 at 13:05

2 Answers2

1

If we remove the python stuff:

struct ItemIterator {
    iter: Box<dyn Iterator<Item = u64> + Send>,
}

impl ItemIterator {
    fn __iter__(&self) -> &'_ ItemIterator {
        self
    }
    fn __next__(&mut self) -> Option<u64> {
        self.iter.next()
    }
}

We see the same error:

error: lifetime may not live long enough
  --> src/lib.rs:21:19
   |
19 |     fn get_items(&self) -> ItemIterator {
   |                  - let's call the lifetime of this reference `'1`
20 |         ItemIterator {
21 |             iter: Box::new(self.items.iter().map(|f| *f)),
   |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cast requires that `'1` must outlive `'static`

The problem is that the iterator holds a reference to underlying data, but there is nothing in the type to indicate so. When you then try to construct an instance that does hold references Rust is going to let you know about it.

Without the Python FFI it can be easily fixed with an extra lifetime on the iterator:

struct ItemIterator<'a> {
    iter: Box<dyn Iterator<Item = u64> + Send + 'a>,
}

Unfortunately, this won't work with the Python bindings because lifetimes and generics are not supported by pyo3. This is going to be annoying because it means that your iterator must own all of the items.

One quick fix would be to clone the vector so that the iterator owns its items. That way, no lifetimes are needed. This should work, but will be very inefficient if there is a lot of data.

Another approach is with shared ownership, using a reference-counting smart pointer; Rc or Arc, and interior mutability; RefCell, RwLock or Mutex. However, this change will have a knock-on effect - all usages of this vector will need to be changed to have to deal with the smart pointer.

use std::{rc::Rc, cell::RefCell};

#[pyclass]
struct ItemIterator {
    items: Rc<RefCell<Vec<u64>>>,
    index: usize,
}

#[pymethods]
impl ItemIterator {
    fn __iter__(&self) -> &'_ ItemIterator {
        self
    }
    fn __next__(&mut self) -> Option<u64> {
        let item = self.items.borrow().get(self.index).copied();
        self.index += 1;
        item
    }
}

#[pyclass]
struct Warehouse {
    items: Rc<RefCell<Vec<u64>>>,
}

#[pymethods]
impl Warehouse {
    fn get_items(&self) -> ItemIterator {
        ItemIterator {
            items: Rc::clone(&self.items),
            index: 0,
        }
    }
}

This should now work because the exposed types and functions do not use lifetimes.

Peter Hall
  • 53,120
  • 14
  • 139
  • 204
  • This results in bunch of other errors: (1) error: #[pyclass] cannot have lifetime parameters, (2) error: #[pymethods] cannot be used with lifetime parameters or generics, (3) error[E0277]: the trait bound `ItemIterator<'a>: PyClass` is not satisfied, (4) error[E0277]: the trait bound `ItemIterator<'a>: PyClass` is not satisfied – iamsmkr Feb 15 '23 at 14:10
  • In that case, you'll have to rethink the design to avoid borrowing. – Peter Hall Feb 15 '23 at 14:11
  • Problem is that the iterator is coming from higher abstractions in the library which is used extensively in the rest of the library which we don't really want to mess with. How would you suggest to avoid borrowing in this case anyway? – iamsmkr Feb 15 '23 at 14:21
  • 1
    I can't think of a way to do it efficiently. One inefficient way is to collect all items into a vector and then return an iterator that owns all of the items. This restriction is very limiting! – Peter Hall Feb 15 '23 at 17:25
  • 1
    @iamsmkr Reference-counting is better. I've updated my answer to use Rc instead. – Peter Hall Feb 15 '23 at 19:18
  • Unlike in the a rather contrived example I created to understand this, the problem I am facing is that I am using one of our internal libraries that maps, filters, etc. over this vector (potentially huge) and eventually returns Box which is available in the python module (which I am trying to write) after a bunch of API calls. Does this mean to achieve what you have just demonstrated I would still need to collect all the elements in a vector? You can see this implementation here: https://github.com/Raphtory/docbrown/blob/features/loaders/pyraphtory/src/graphdb.rs#L177 – iamsmkr Feb 15 '23 at 22:02
0

Based on @peter-hall suggestion I managed to implement a working solution (though inefficient):

#[pyclass]
struct ItemIterator {
    iter: std::vec::IntoIter<u64>,
}

#[pymethods]
impl ItemIterator {
    fn __iter__(slf: PyRef<'_, Self>) -> PyRef<'_, Self> {
        slf
    }
    fn __next__(mut slf: PyRefMut<'_, Self>) -> Option<u64> {
        slf.iter.next()
    }
}

#[pyclass]
struct Warehouse {
    items: Vec<u64>,
}

#[pymethods]
impl Warehouse {
    #[new]
    fn new() -> Warehouse {
        Warehouse {
            items: vec![1u64, 2, 3, 4, 5],
        }
    }

    fn get_items(&self) -> ItemIterator {
        ItemIterator {
            iter: self.items.collect::<Vec<_>>().into_iter(),
        }
    }
}
iamsmkr
  • 800
  • 2
  • 10
  • 29