-1

I'd like to implement a data structure that stores values in 2 different orders.

Values are inserted once (in both orders) and cannot be removed without dropping the entire data structure ("append only"). They can still be mutated in place.

The values can be accessed immutably or mutably through either one of the orders and the appropriate functions take an immutable borrow of self or a mutable borrow of self, respectively. Mutating a value is applied to the value in both orders.

A simplified version of the code looks something like this:

pub struct TwoOrders<V> {
    first_order: Vec<*mut V>,
    second_order: Vec<*mut V>,
}

impl<V> TwoOrders<V> {
    pub fn new() -> Self {
        Self {
            first_order: vec![],
            second_order: vec![],
        }
    }

    pub fn insert(&mut self, value: V) {
        let ptr = Box::into_raw(Box::new(value));

        // insert ptr into self.first_order in some position

        // insert ptr into self.second_order in another position
    }

    pub fn get_from_first_order(&self, index: usize) -> Option<&V> {
        self.first_order
            .get(index)
            .map(|v| unsafe { v.as_ref().unwrap() })
    }

    pub fn get_from_second_order(&self, index: usize) -> Option<&V> {
        self.second_order
            .get(index)
            .map(|v| unsafe { v.as_ref().unwrap() })
    }

    pub fn get_mut_from_first_order(
        &mut self,
        index: usize,
    ) -> Option<&mut V> {
        self.first_order
            .get_mut(index)
            .map(|v| unsafe { v.as_mut().unwrap() })
    }

    pub fn get_mut_from_second_order(
        &mut self,
        index: usize,
    ) -> Option<&mut V> {
        self.second_order
            .get_mut(index)
            .map(|v| unsafe { v.as_mut().unwrap() })
    }
}

impl<V> Drop for TwoOrders<V> {
    fn drop(&mut self) {
        self.first_order.clear();
        for value_ptr in self.second_order.drain(..) {
            std::mem::drop(unsafe { Box::from_raw(value_ptr) })
        }
    }
}

My question is: is this actually safe?

I am assuming that since a mutable reference to a value is attained only with a mutable borrow of self, it means that the borrow checker guarantees that there is only one exclusive reference to this value during its lifetime, so the struct itself is conceptually like a compile-time lock.

The raw pointers are never exposed outside the struct and no function allows any access to the values in a way that breaks the safety requirements of references.

Box::into_raw stops the values from being automatically dropped, and only one copy of the raw pointers is dropped with a manually implemented Drop (using Box::from_raw).

Are my assumptions correct?

Another question is: is there any existing construct in rust (in the standard library or otherwise) that would solve this use case without using raw pointers, but also without having to use a reference counted pointer or locks?

Yaar Hever
  • 81
  • 1
  • 6
  • 1
    It looks safe to me, but I would wait for an opinion from someone more knowledgeable in the matter. As for alternative ways to implement such a structure without pointers, what you could do is store three vectors: one for the actual values, and two vectors containing `usize`s that define both the orders by mapping an index in the order to the index of the corresponding value in the value vector. When you have to retrieve a value via any order, you first look up the index in the order vector and then return the value at that index in the value vector. – EvilTak Jan 04 '21 at 05:13
  • Thanks. My main question is whether I'm right in trusting the borrow checker of `self` when doing the internal logic with the raw pointers. I was considering the option you suggested with a `usize` vector. The actual implementation has `BTreeMap`s with different keys that are immutable. I would like to use `rayon` internally for multithreading and I think that this would require me to allocate a vector of mutable references every time... – Yaar Hever Jan 04 '21 at 11:19
  • Upon further inspection, it looks like `rayon` internally allocates a vector for its parallel iterators of `BTreeMap`: [here](https://github.com/rayon-rs/rayon/blob/1f069d77101bbc42e4a42c4a5aec16b326dc4e32/src/collections/btree_map.rs#L58) and [here](https://github.com/rayon-rs/rayon/blob/1f069d77101bbc42e4a42c4a5aec16b326dc4e32/src/collections/mod.rs#L18), so I might as well do it myself. – Yaar Hever Jan 04 '21 at 11:41
  • 1
    From a first glance the code seems safe. However, I'm not so sure. But here might be the explanation from jonhoo a good starting point: https://youtu.be/EY7Wi9fV5bk Maybe you find something that helps you to check if it is safe. – schrieveslaach Jan 04 '21 at 12:57
  • I watched this wonderful youtube stream by Jon Gjengset and definitely learned a lot from it, but his considerations are of a somewhat different use case. I don't need to cast the collection or share it, I have 2 different collections that share the same values. So, 2 aliased `Box`es are bad. But what about 2 raw pointers from a leaked box (so only 1 original `Box`)? Do I still need `MaybeUninit`? Would `ManuallyDrop` help me here? – Yaar Hever Jan 04 '21 at 15:06

1 Answers1

2

A simplified version of the code looks something like this

If this is not the complete code then no one can say for certain. Any other unsafe usage could mean these unsafe blocks really aren't safe. Whether an unsafe usage is safe cannot be determined in isolation.

However, it looks like you have the basics. Only exposing the internals via &self -> &T and &mut self -> &mut T is a great rule of thumb. Using raw pointers instead of trying to keep the Boxs is the right way to go. You do not need MaybeUninit or ManuallyDrop.

Something you are missing though is a PhantomData<V> field in the TwoOrders struct:

pub struct TwoOrders<V> {
    first_order: Vec<*mut V>,
    second_order: Vec<*mut V>,

    _marker: PhantomData<V>,
}

This tells the compiler that your struct owns and may run destructors for Vs. This is needed so that the compiler can do drop check analysis that guards against lifetime issues in drops. See the Rust Nomicon on PhantomData.

kmdreko
  • 42,554
  • 6
  • 57
  • 106
  • Thanks a lot for your answer! I'm glad to get the confirmation that what I'm doing seems correct (now for the 3rd time). I was not aware of the requirement of using `PhantomData` with owned values, I thought I would only need it for lifetimes. The chapter in the Nomicon was very helpful. I wish there was some checklist I could follow for safety consideration of unsafe code... – Yaar Hever Jan 05 '21 at 12:04