4

I'm working on a custom type where I have the following requirements:

  1. A collection of elements that avoid heap allocation. I am using arrays instead of a Vec.
  2. The collection contains non-copyable types
  3. Implements Default for types that also implement Default
  4. Implements From so that I can build it straight from an array

My biggest problem is implementing Default in a safe and useful way. Being able to support movable types in the array has provided some challenges. Initially I blindly used mem::uninitialized() followed by a for loop of ptr::write(&mut data[index], Element::default()) calls to initialize it, but I found that if the default() call of the individual elements ever panicked, then it would try to call drop on all of the uninitialized data in the array.

My next step involved using the nodrop crate to prevent that. I now no longer call drop on any uninitialized data, but if any of the elements do happen to panic on default(), then the ones before it which were correctly built never call drop either.

Is there is any way to either tell the Rust compiler it is safe to call drop on the previous array elements that were correctly built or is there a different way to approach this?

To be clear, if one of the individual calls to Element::default() panics, I want:

  1. Uninitialized elements do not call drop
  2. Properly initialized elements do call drop

I'm not sure it is possible based on what I have read so far, but I wanted to check.

This code shows where I am at:

extern crate nodrop;

use nodrop::NoDrop;

struct Dummy;
impl Drop for Dummy {
    fn drop(&mut self) {
        println!("dropping");
    }
}
impl Default for Dummy {
    fn default() -> Self {
        unsafe {
            static mut COUNT: usize = 0;
            if COUNT < 3 {
                COUNT += 1;
                println!("default");
                return Dummy {};
            } else {
                panic!("oh noes!");
            }
        }
    }
}

const CAPACITY: usize = 5;

struct Composite<Element> {
    data: [Element; CAPACITY],
}

impl<Element> Default for Composite<Element>
where
    Element: Default,
{
    fn default() -> Self {
        let mut temp: NoDrop<Self> = NoDrop::new(Self {
            data: unsafe { std::mem::uninitialized() },
        });

        unsafe {
            for index in 0..CAPACITY {
                std::ptr::write(&mut temp.data[index], Element::default());
            }
        }

        return temp.into_inner();
    }
}

impl<Element> From<[Element; CAPACITY]> for Composite<Element> {
    fn from(value: [Element; CAPACITY]) -> Self {
        return Self { data: value };
    }
}

pub fn main() {
    let _v1: Composite<Dummy> = Composite::default();
}

Playground

It gets as far as ensuring uninitialized elements of the array don't call drop, but it doesn't yet allow for properly initialized components to call drop (they act like the uninitialized components and don't call drop). I force the Element::default() call to generate a panic on a later element just to show the issue.

Actual output

Standard Error:

Compiling playground v0.0.1 (file:///playground)
Finished dev [unoptimized + debuginfo] target(s) in 0.56 secs
 Running `target/debug/playground`
thread 'main' panicked at 'oh noes!', src/main.rs:19:17
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Standard Output:

default
default
default

Intended output

Standard Error:

Compiling playground v0.0.1 (file:///playground)
Finished dev [unoptimized + debuginfo] target(s) in 0.56 secs
 Running `target/debug/playground`
thread 'main' panicked at 'oh noes!', src/main.rs:19:17
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Standard Output:

default
default
default
dropped
dropped
dropped
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Jere
  • 3,124
  • 7
  • 15

1 Answers1

4

Is there a way to tell the Rust compiler to call drop on partially-initialized array elements when handling a panic?

No, but you can call drop yourself. You need to run code when a panic occurs.

Brute-force solution

This uses the building blocks of catch_unwind, resume_unwind, and AssertUnwindSafe to notice that a panic occurred and run some cleanup code:

fn default() -> Self {
    use std::panic::{self, AssertUnwindSafe};

    let mut temp = NoDrop::new(Self {
        data: unsafe { std::mem::uninitialized() },
    });

    let mut valid = 0;

    let panicked = {
        let mut temp = AssertUnwindSafe(&mut temp);
        let mut valid = AssertUnwindSafe(&mut valid);

        std::panic::catch_unwind(move || unsafe {
            for index in 0..CAPACITY {
                std::ptr::write(&mut temp.data[index], T::default());
                **valid += 1;
            }
        })
    };

    if let Err(e) = panicked {
        for i in 0..valid {
            unsafe { std::ptr::read(&temp.data[i]) };
        }

        panic::resume_unwind(e);
    }

    temp.into_inner()
}

Slightly nicer

Once you recognize that a type's Drop implementation is run when a panic occurs, you can use that to your advantage by creating a drop bomb — a type that cleans up when dropped but in the success path it is not dropped:

extern crate nodrop;

use nodrop::NoDrop;

use std::{mem, ptr};

const CAPACITY: usize = 5;
type Data<T> = [T; CAPACITY];

struct Temp<T> {
    data: NoDrop<Data<T>>,
    valid: usize,
}

impl<T> Temp<T> {
    unsafe fn new() -> Self {
        Self {
            data: NoDrop::new(mem::uninitialized()),
            valid: 0,
        }
    }

    unsafe fn push(&mut self, v: T) {
        if self.valid < CAPACITY {
            ptr::write(&mut self.data[self.valid], v);
            self.valid += 1;
        }
    }

    unsafe fn into_inner(mut self) -> Data<T> {
        let data = mem::replace(&mut self.data, mem::uninitialized());
        mem::forget(self);
        data.into_inner()
    }
}

impl<T> Drop for Temp<T> {
    fn drop(&mut self) {
        unsafe {
            for i in 0..self.valid {
                ptr::read(&self.data[i]);
            }
        }
    }
}

struct Composite<T>(Data<T>);

impl<T> Default for Composite<T>
where
    T: Default,
{
    fn default() -> Self {
        unsafe {
            let mut tmp = Temp::new();

            for _ in 0..CAPACITY {
                tmp.push(T::default());
            }

            Composite(tmp.into_inner())
        }
    }
}

impl<T> From<Data<T>> for Composite<T> {
    fn from(value: Data<T>) -> Self {
        Composite(value)
    }
}

struct Dummy;

impl Drop for Dummy {
    fn drop(&mut self) {
        println!("dropping");
    }
}

impl Default for Dummy {
    fn default() -> Self {
        use std::sync::atomic::{AtomicUsize, Ordering, ATOMIC_USIZE_INIT};

        static COUNT: AtomicUsize = ATOMIC_USIZE_INIT;

        let count = COUNT.fetch_add(1, Ordering::SeqCst);
        if count < 3 {
            println!("default");
            Dummy {}
        } else {
            panic!("oh noes!");
        }
    }
}

pub fn main() {
    let _v1: Composite<Dummy> = Composite::default();
}

Note that I've made some unrelated cleanups:

  1. Using an atomic variable instead of unsafe static mutable variables.
  2. Don't use return as the last statement of a block.
  3. Converted Composite into a newtype, as data isn't a wonderful variable name.
  4. Imported the mem and ptr modules for easier access.
  5. Created the Data<T> type alias to avoid retyping that detail.

An elegant solution

The choice of push in the second solution is no accident. Temp is a poor implementation of a variable-sized stack-allocated vector. There's a good implementation called arrayvec which we can use instead:

extern crate arrayvec;

use arrayvec::ArrayVec;

const CAPACITY: usize = 5;
type Data<T> = [T; CAPACITY];

struct Composite<T>(Data<T>);

impl<T> Default for Composite<T>
where
    T: Default,
{
    fn default() -> Self {
        let tmp: ArrayVec<_> = (0..CAPACITY).map(|_| T::default()).collect();

        match tmp.into_inner() {
            Ok(data) => Composite(data),
            Err(_) => panic!("Didn't insert enough values"),
        }
    }
}

Would you be surprised to learn that nodrop was created in a large part to be used for arrayvec? The same author created both!

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • Thank you very much! Two followup questions for my own info: 1. Do .map() and .collect() not use heap allocation? I've seen them used in context of collections, which typically do. 2. In the std::mem::replace() call, I know that std::mem::uninitialized() doesn't do anything normally, but since it is passed in as a parameter, is another equally sized array created temporarily to pass into replace()? Asking in case I have large arrays. – Jere Apr 29 '18 at 00:42
  • 1
    @Jere no, `map` and `collect` do not use the heap (why would they even need to?). They are [provided by libcore](https://doc.rust-lang.org/core/iter/trait.Iterator.html), which doesn't even have access to an allocator. You can see the implementations of [`map`](https://github.com/rust-lang/rust/blob/1.25.0/src/libcore/iter/mod.rs#L1308-L1310) and [`collect`](https://github.com/rust-lang/rust/blob/1.25.0/src/libcore/iter/iterator.rs#L1302-L1304) to verify for yourself. – Shepmaster Apr 29 '18 at 00:55
  • 1
    @Jere it is possible that the copying the uninitialized array may incur some overhead, but returning the array from the call to `default` may also require moving it, and every time you move the `Composite` has the same issue. The docs for [`ArrayVec::into_inner`](https://docs.rs/arrayvec/0.4.7/arrayvec/struct.ArrayVec.html#method.into_inner) mention some overhead, which I *assume* is referring to this. – Shepmaster Apr 29 '18 at 00:58