snrm2 calculation instability for single-precision floats on Accelerate

Question

I'm trying to use snrm2 to perform a single precision float calculation in Rust. I'm linking to the Accelerate framework on OSX and using the blas crate for the C-bridge. Regardless of the randomly generated vector values (which usually calculate to [12, 13] in the native code) the snrm2 logic consistently returns:

-36893490000000000000
-2
36893490000000000000
0

This same code works as advertised when I switch everything to f64 and dnrm2. Is this a bug within Accelerate, or am I not meeting some of its internal assumptions about memory alignment or call arguments?

use rand::prelude::*;
use blas::*;
type Vector = [f32; 2048];

// Native implementation
fn euclidean_distance_standard(a: &Vector, b: &Vector) -> f32 {
    a.iter().zip(b.iter()).fold(0.0, |acc, (&x, &y)| {
        let diff = x - y;
        acc + diff * diff
    }).sqrt()
}

// Using snrm2 from BLAS
fn euclidean_distance_blas(a: &Vector, b: &Vector) -> f32 {
    let mut diff = [0.0; 2048];
    for i in 0..2048 {
        diff[i] = a[i] - b[i];
    }
    unsafe {
        snrm2(diff.len() as i32, &diff, 1)
    }
}

fn main() {
    let mut rng = rand::thread_rng();

    // Initialize a random vector
    let mut vector_a = [0.0; 2048];
    for i in 0..2048 {
        vector_a[i] = rng.gen::<f32>();
    }

    let vector_b = [0.5; 2048];

    let result_standard = euclidean_distance_standard(&vector_a, &vector_b);
    let result_blas = euclidean_distance_blas(&vector_a, &vector_b);

    println!("Standard Method Result: {}", result_standard);
    println!("OpenBLAS Method Result: {}", result_blas);
}

# build.rs
fn main() {
    println!("cargo:rustc-link-lib=framework=Accelerate");
}

# Cargo.toml
[dependencies]
rand = "0.8"
blas = "0.21"
blas-src = { version = "0.9", features = ["accelerate"] }
libc = "0.2.36"

$ cargo run --release
    Finished release [optimized] target(s) in 0.01s
     Running `target/release/similarity-comp`
Standard Method Result: 13.005363
OpenBLAS Method Result: -36893490000000000000

I don't have an answer as to why this is occurring for you, but I don't see this with a normal BLAS on Debian amd64/sid. Maybe try without the `accelerate` feature and see if that works as a way to troubleshoot the problem more precisely? — bk2204, Aug 31 '23 at 00:10
There is a long discussion about this bug here: https://fortran-lang.discourse.group/t/how-many-blas-libraries-have-this-error/4454 — IPribec, Aug 31 '23 at 14:42
This issue is also linked: https://stackoverflow.com/questions/50316681/blas-function-returns-zero-in-fortran90 — IPribec, Aug 31 '23 at 17:25

score 3 · Answer 1 · answered Aug 31 '23 at 15:24

I got in touch with Apple about this and they pointed me to their latest implementation of the Fortran77 interface. It returns doubles for all floating point values:

func cblas_snrm2(
    _ __N: Int32,
    _ __X: UnsafePointer<Float>!,
    _ __incX: Int32
) -> Float

The blas crate specifies the return value as f32 (so should probably be patched). But modifying the local foreign function interface like this gives the right answer, at least on arm architectures.

extern crate libc;
use libc::c_int;

extern "C" {
    fn snrm2_(n: *const c_int, x: *const f32, incx: *const c_int) -> f64;
}

pub unsafe fn snrm2(n: i32, x: &[f32], incx: i32) -> f64 {
    snrm2_(&n, x.as_ptr(), &incx)
}

Standard: 173.71765, OpenBLAS: 173.71761182650607

That's precisely the bug. The Reference BLAS `snrm2` returns a `float`, just like the BLAS crate. The fact `snrm2_` (the mangled name a Fortran compiler would emit) in Accelerate returns `f64` is a bug originating from the f2c conversion decades ago. You're better off relying on the public `cblas_` interface Apple documents. — IPribec, Aug 31 '23 at 15:37

score 0 · Answer 2 · answered Aug 31 '23 at 06:57

This is a bug but probably in the wrapper, I tried in a few different ways to compute that.

First hypothesis was that the error was in sgrm2, then I tried to compute the squared norm using sdot, and the same behavior is observed.

Then I considerd another routine returning an f32, the 1-norm, sasum, the buggy results are observed there as well.

Then I tried one function that receives a vector of f32 and returns an integer, isamax, and it works fine.

This happens both using feature accelerate or openblas.

So my hypothesis is that the wrapper has a problem when returning f32. What I did next was to pick a level 2 function sgemv that computes y = alpha * A * x + beta * y, setting alpha=1.0, beta=0.0, and A = x' we get y = x'*x that is the square of the norm. The difference is that this will be written to a mutable [f32] argument, instead of returned by the function. In this case we get the expected result.

Here is the complete program you can test on your side


extern crate blas;
use blas::*;

fn main() {
    let v1: [f32; 6] = [2.0, 0.0, 5.0, 4.0, 2.0, 0.0];
    let mut y1: [f32; 1] = [0.0];
    let v1d: [f64; 6] = [2.0, 0.0, 5.0, 4.0, -2.0, 0.0];
    let mut y1d: [f64; 1] = [0.0];
    let n = v1.len() as i32;
    unsafe {
        sgemv('N' as u8, 1, n, 1.0, &v1, 1, &v1, 1,   0.0, &mut y1, 1);
    };
    println!("snrm2={:.3}", unsafe { snrm2(v1.len() as i32, &v1, 1) });
    println!("|x|_1 via sasum={:.3}", unsafe { sasum(v1.len() as i32, &v1, 1) });
    println!("|x|^2 via sdot={:.3}", unsafe { sdot(v1.len() as i32, &v1, 1, &v1, 1) });
    println!("argmax isamax={}", unsafe { isamax(v1.len() as i32, &v1, 1) });
    println!("|x|^2 via sgemv={:.3}", y1[0]);
    
    unsafe {
        dgemv('N' as u8, 1, n, 1.0, &v1d, 1, &v1d, 1,   0.0, &mut y1d, 1);
    };
    println!("dnrm2={:.3}", unsafe { dnrm2(v1d.len() as i32, &v1d, 1) });
    println!("|x|_1 via dasum={:.3}", unsafe { dasum(v1d.len() as i32, &v1d, 1) });
    println!("|x|^2 via ddot={:.3}", unsafe { ddot(v1d.len() as i32, &v1d, 1, &v1d, 1) });
    println!("argmax via idamax={}", unsafe { idamax(v1d.len() as i32, &v1d, 1) });
    println!("|x|^2 via dgemv={:.3}", y1d[0]);
    
}

The input vector has 2-norm 7 (squared norm 49), and 1-norm 13.

The program output was correct for all fp64 routines, for fp32 routines only isamax and sgemv behaved as expected.

snrm2=0.000
|x|_1 via sasum=0.000
|x|^2 via sdot=0.000
argmax isamax=3
|x|^2 via sgemv=49.000
dnrm2=7.000
|x|_1 via dasum=13.000
|x|^2 via ddot=49.000
argmax via idamax=3
|x|^2 via dgemv=49.000

Which wrapper are you referring to, the Rust one or the Accelerate header? — IPribec, Aug 31 '23 at 14:55
It would also be helpful to know precisely which version of OpenBLAS you are linking. — IPribec, Aug 31 '23 at 15:05
I just used your same configuration, to be honest I am not a rust programmer. I took this as a tutorial and brought what I knew about openblas and algebra. — Bob, Sep 01 '23 at 07:33

snrm2 calculation instability for single-precision floats on Accelerate

2 Answers2