3

I have multiple threads doing a computation and want to collect the results into a pre-allocated vector. To turn off the borrow checker, I wrote the function:

fn set_unsync(vec: &Vec<usize>, idx: usize, val: usize) {
    let first_elem = vec.as_ptr() as *mut usize;
    unsafe { *first_elem.add(idx) = val }
}

With that we can fill a vector concurrently (e.g. using Rayon):

let vec = vec![0; 10];
(0..10).into_par_iter().for_each(|i| set_unsync(&vec, i, i));

It compiles, it works, and even Clippy likes it, but is it sound? After reading about things that appear to be sound but actually are Undefined Behavior, I'm unsure. For example, the documentation of the as_ptr method says:

The caller must also ensure that the memory the pointer (non-transitively) points to is never written to (except inside an UnsafeCell) using this pointer or any pointer derived from it.

Strictly speaking, the solution violates this. However, it feels sound to me. If it is not, how can we let multiple threads write to nonoverlapping parts of the same vector without using locks?

user1494080
  • 2,064
  • 2
  • 17
  • 36
  • 1
    Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/248718/discussion-on-question-by-user1494080-is-it-sound-to-use-raw-pointers-to-an-allo). – Samuel Liew Oct 11 '22 at 01:49

1 Answers1

4

Assuming this is your minimal reproducible example:

use rayon::prelude::*;

fn set_unsync(vec: &Vec<usize>, idx: usize, val: usize) {
    let first_elem = vec.as_ptr() as *mut usize;
    unsafe { *first_elem.add(idx) = val }
}

fn main() {

    let input = vec![2, 3, 9];
    let output = vec![0; 100];
    input.par_iter().for_each(|&i| {
        for j in i * 10..(i + 1) * 10 {
            set_unsync(&output, j, i);
        }
    });

    dbg!(output);
}

If you are asking of whether this code works and always will work, then I'd answer with yes.

BUT: it violates many rules on how safe and unsafe code should interact with each other.

If you write a function that is not marked unsafe, you indicate that this method can be abused by users in any way possible without causing undefined behaviour (note that "users" here is not just other people, this also means your own code in safe sections). If you cannot guarantee this, you should mark it unsafe, requiring the caller of the function to mark the invocation as unsafe as well, because the caller then again has to make sure he is using your function correctly. And every point in your code that required a programmer to manually prove that it is free of undefined behaviour must require an unsafe as well. If it's possible to have sections that require a human to prove this, but do not require an unsafe, there is something unsound in your code.

In your case, the set_unsync function is not marked unsafe, but the following code causes undefined behaviour:

fn set_unsync(vec: &Vec<usize>, idx: usize, val: usize) {
    let first_elem = vec.as_ptr() as *mut usize;
    unsafe { *first_elem.add(idx) = val }
}

fn main() {
    let output = vec![0; 5];

    set_unsync(&output, 100000000000000, 42);

    dbg!(output);
}

Not that at no point in your main did you need an unsafe, and yet a segfault is happening here.

Now if you say "but set_unsync is not pub, so no one else can call it. And I, in my par_iter, have ensured that I am using it correctly" - then this is the best indicator that you should mark set_unsync as unsafe. The act of "having to ensure to use the function correctly" is more or less the definition of an unsafe function. unsafe doesn't mean it will break horribly, it just means that the caller has to manually make sure that he is using it correctly, because the compiler can't. It's unsafe from the compiler's point of view.

Here is an example of how your code could be rewritten in a more sound way. I don't claim that it is 100% sound, because I haven't thought about it enough. But I hope this demonstrates how to cleanly interface between safe and unsafe code:

use rayon::prelude::*;

// mark as unsafe, as it's possible to provide parameters that
// cause undefined behaviour
unsafe fn set_unsync(vec: &[usize], idx: usize, val: usize) {
    let first_elem = vec.as_ptr() as *mut usize;

    // No need to use unsafe{} here, as the entire function is already unsafe
    *first_elem.add(idx) = val
}

// Does not need to be marked `unsafe` as no combination of parameters
// could cause undefined behaviour.
// Also note that output is marked `&mut`, which is also crucial.
// Mutating data behind a non-mutable reference is also considered undefined
// behaviour.
fn do_something(input: &[usize], output: &mut [usize]) {
    input.par_iter().for_each(|&i| {
        // This assert is crucial for soundness, otherwise an incorrect value
        // in `input` could cause an out-of-bounds access on `output`
        assert!((i + 1) * 10 <= output.len());

        for j in i * 10..(i + 1) * 10 {
            unsafe {
                // This is the critical point where we interface
                // from safe to unsafe code.
                // This call requires the programmer to manually verify that
                // `set_unsync` never gets called with dangerous parameters.
                set_unsync(&output, j, i);
            }
        }
    });
}

fn main() {
    // note that we now have to declare output `mut`, as it should be
    let input = vec![2, 3, 9];
    let mut output = vec![0; 100];

    do_something(&input, &mut output);

    dbg!(output);
}
Finomnis
  • 18,094
  • 1
  • 20
  • 27
  • Thanks for the comprehensive explanation. – user1494080 Oct 11 '22 at 00:02
  • 2
    A slightly more sound implementation that Miri likes, using `SyncUnsafeCell`: [playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=2b6a44a57e365ea2ff633e1598c33d16). This never mutates behind an immutable reference or aliases a mutable reference and appears to be compatible with stacked borrows. – PitaJ Oct 11 '22 at 16:01
  • @PitaJ "Compatible with stacked borrows" - why isn't mine? Not criticizing, just curious :) – Finomnis Oct 12 '22 at 07:26
  • Yours triggers a MIRI warning for stacked borrows, I think because you mutate behind a shared reference without an UnsafeCell. – PitaJ Oct 12 '22 at 13:32
  • Nice, I wasn't aware there is a multithreaded version of `UnsafeCell`. Unfortunately, it's only available in nightly. – user1494080 Oct 12 '22 at 14:42
  • As mentioned in the comment at the top, it's available in stable via crates.io as well. – PitaJ Oct 12 '22 at 16:54