How to round up or down when converting f32 to bf16 in rust?

Question

I am converting from f32 to bf16 in rust, and want to control the direction of the rounding error. Is there an easy way to do this?

Converting using the standard bf16::to_f32 rounds to the nearest bf16 value, which can be either larger or smaller than the original:

use half::bf16;
fn main() {
    for x in vec![0.1_f32, 0.11, 0.12] {
        let xh = bf16::from_f32(x);
        let diff = x - bf16::to_f32(xh);
        println!("{}, {}, {}",  x, bf16::from_f32(x), diff)
    }
}

Output:

0.1, 0.100097656, -0.00009765476
0.11, 0.10986328, 0.00013671815
0.12, 0.12011719, -0.00011719018

Only the documentation of `half` can tell you that, and it doesn't seem like it supports bespoke rounding modes. Either way, this is something which should be discussed with and reported to the crate maintainer. — Masklinn, Nov 03 '22 at 09:49
The `bf16` type is implemented purely in software. It's basically an `f32` with the significand truncated to eight bits. It doesn't look like the `half` crate supports any other rounding mode, so you either need to convince the author of the crate to implement that, or implement it yourself. The conversion function does [some simple bit twiddling](https://docs.rs/half/2.1.0/src/half/bfloat/convert.rs.html#4-22), so it's not difficult to implement this yourself. — Sven Marnach, Nov 03 '22 at 10:34

How to round up or down when converting f32 to bf16 in rust?

0 Answers0