2

In multi core embedded Rust, is it appropriate to use a static mut for one way data sharing from one core to the other core?

Here’s the code (using embassy)

#![no_std]
// …

static mut CORE1_STACK: Stack<4096> = Stack::new();
static EXECUTOR0: StaticCell<Executor> = StaticCell::new();
static EXECUTOR1: StaticCell<Executor> = StaticCell::new();

static mut one_way_data_exchange:u8 = 0;


#[cortex_m_rt::entry]
fn main() -> ! {
    spawn_core1(p.CORE1, unsafe { &mut CORE1_STACK }, move || {
        let executor1 = EXECUTOR1.init(Executor::new());
        executor1.run(|spawner| unwrap!(spawner.spawn(core1_task())));
    });

    let executor0 = EXECUTOR0.init(Executor::new());
    executor0.run(|spawner| unwrap!(spawner.spawn(core0_task())));
}

#[embassy_executor::task]
async fn core0_task() {
    info!("Hello from core 0");
    loop {
        unsafe { one_way_data_exchange = 128; } // sensor value
    }
}

#[embassy_executor::task]
async fn core1_task() {
    info!("Hello from core 1");
    let sensor_val:u8 = 0;
    loop {
        unsafe { sensor_val = one_way_data_exchange; }

        // continue with rest of program
        }
    }
}

If I were to be writing to the static var from both cores, that would obviously create a race condition. But if I only ever write from one core and only ever read from the other core, does that solve the race condition? Or, is it still problematic for both cores to be accessing it in parallel, even if only one is writing?

The order of read->write or write->read, in this case, doesn’t matter. One core is just creating a stream of IO input and the other dips into that stream whenever it’s ready to process the loop again, even if it misses some intermittent inputs.

risingtiger
  • 851
  • 1
  • 11
  • 21
  • 1
    _"But if I only ever write from one core and only ever read from the other core"_ You might want to read about [hazards](https://en.wikipedia.org/wiki/Hazard_(computer_architecture)). More relevant reading: https://stackoverflow.com/questions/49174630/why-does-rust-disallow-mutable-aliasing – E_net4 May 26 '23 at 13:44
  • The order of read->write or write->read, in this case, doesn’t matter. One core is just creating a stream of IO input and the other dips into that stream whenever it’s ready to process the loop again, even if it misses some intermittent inputs. – risingtiger May 26 '23 at 14:19
  • My point is that in both cases it is a data race if you do not synchronize them. See the necessary conditions for a data race to happen [here](https://doc.rust-lang.org/nomicon/races.html#data-races-and-race-conditions). – E_net4 May 26 '23 at 14:23

1 Answers1

2

No, it is still a problem.

  • Writes are not always guaranteed to be atomic. For example on a 32-bit system, a u64 takes multiple cpu cycles to write - and therefore the reading side could see only half of the value updated.
  • This breaks soundness because the compiler can no longer prove that your code is free of undefined behavior

It is true that accessing very simple primitive types like this can be safe. You don't need static mut for it, though - there are mechanisms built into the language / core library so you don't have to resort to static mut. In this case, the important one would be atomic.

It provides something called interior mutability. This means your value can be static without mut, and can be shared normally, and the type itself provides the mutability.

Let me demonstrate. As I don't have a microcontroller available right now, I rewrote your example for normal execution:

use core::time::Duration;

static mut ONE_WAY_DATA_EXCHANGE: u8 = 0;

fn main() {
    std::thread::scope(|s| {
        s.spawn(thread0);
        s.spawn(thread1);
    });
}

fn thread0() {
    std::thread::sleep(Duration::from_millis(500));
    unsafe { ONE_WAY_DATA_EXCHANGE = 42 };
}

fn thread1() {
    for _ in 0..4 {
        std::thread::sleep(Duration::from_millis(200));
        let value = unsafe { ONE_WAY_DATA_EXCHANGE };
        println!("{}", value);
    }
}
0
0
42
42

Here is how this would look like when implemented with atomic:

use core::{
    sync::atomic::{AtomicU8, Ordering},
    time::Duration,
};

static ONE_WAY_DATA_EXCHANGE: AtomicU8 = AtomicU8::new(0);

fn main() {
    std::thread::scope(|s| {
        s.spawn(thread0);
        s.spawn(thread1);
    });
}

fn thread0() {
    std::thread::sleep(Duration::from_millis(500));
    ONE_WAY_DATA_EXCHANGE.store(42, Ordering::Release);
}

fn thread1() {
    for _ in 0..4 {
        std::thread::sleep(Duration::from_millis(200));
        let value = ONE_WAY_DATA_EXCHANGE.load(Ordering::Acquire);
        println!("{}", value);
    }
}
0
0
42
42

Note that the code does not contain an unsafe; this is prefectly valid for the compiler to understand and has (almost) no runtime overhead.


To demonstrate how little overhead this really causes:

#![no_std]

use core::sync::atomic::{AtomicU8, Ordering};

static SHARED_VALUE_ATOMIC: AtomicU8 = AtomicU8::new(0);

pub fn write_static_atomic(val: u8){
    SHARED_VALUE_ATOMIC.store(val, Ordering::SeqCst)
}

pub fn read_static_atomic() -> u8 {
    SHARED_VALUE_ATOMIC.load(Ordering::SeqCst)
}

static mut SHARED_VALUE_STATICMUT: u8 = 0;

pub fn write_static_staticmut(val: u8){
    unsafe {
        SHARED_VALUE_STATICMUT = val;
    }
}

pub fn read_static_staticmut() -> u8 {
    unsafe {
        SHARED_VALUE_STATICMUT
    }
}

The code compiles to the following, using the flags -C opt-level=3 -C linker-plugin-lto --target=thumbv6m-none-eabi:

example::write_static_atomic:
        dmb     sy
        ldr     r1, .LCPI0_0
        strb    r0, [r1]
        dmb     sy
        bx      lr
.LCPI0_0:
        .long   example::SHARED_VALUE_ATOMIC.0

example::read_static_atomic:
        ldr     r0, .LCPI1_0
        ldrb    r0, [r0]
        dmb     sy
        bx      lr
.LCPI1_0:
        .long   example::SHARED_VALUE_ATOMIC.0

example::write_static_staticmut:
        ldr     r1, .LCPI2_0
        strb    r0, [r1]
        bx      lr
.LCPI2_0:
        .long   example::SHARED_VALUE_STATICMUT.0

example::read_static_staticmut:
        ldr     r0, .LCPI3_0
        ldrb    r0, [r0]
        bx      lr
.LCPI3_0:
        .long   example::SHARED_VALUE_STATICMUT.0

example::SHARED_VALUE_ATOMIC.0:
        .byte   0

example::SHARED_VALUE_STATICMUT.0:
        .byte   0

An AtomicU8 in on thumbv6m-none-eabi seems to have almost zero overhead. The only changes are the dmb sy, which are memory barriers that prevent race conditions; using Ordering::Relaxed (if your problem allows it) should eliminate those, causing actual zero overhead. Other architectures should behave similar.

Finomnis
  • 18,094
  • 1
  • 20
  • 27
  • Awesome. Thanks for the clarification. AtomicU8 looks like the way to go, assuming it is indeed almost no compute overhead (I’m looking to squeeze a lot of work out of these VERY limited MCU chips). Any reference anywhere of what the overhead of Atomics are? And, though it looks like Atomics are the way to go, out of curiosity, is it technically feasible to use static mut for simple 32bit or less numbers since the integer will be written within a clock cycle, or are there other factors that still screw things up? – risingtiger May 26 '23 at 14:44
  • 1
    @risingtiger This is still not safe. Compile optimizations and out-of-order execution (assuming you don't use relaxed ordering), and RMW (Read-Modify-Write) actions like add may be broken. – Chayim Friedman May 27 '23 at 19:39
  • If you want zero overhead, you want two things: An architecture where writing to the value is really atomic (usually everything up to the native bit width; like, if your system is 32bit, then everything up to `u32`). Second, that you use the correct `Ordering`. The most relaxed one is `Ordering::Relaxed`, it is zero-overhead in most cases. I added an example of how little overhead it really causes in my answer. – Finomnis May 27 '23 at 19:49
  • 1
    Relaxed loads and stores are free on ARM and x86. RMW actions are not. – Chayim Friedman May 28 '23 at 08:17