4

Assuming you have a binary file example.bin and you want to read that file in units of f64, i.e. the first 8 bytes give a float, the next 8 bytes give a number, etc. (assuming you know endianess) How can this be done in Rust?

I know that one can use std::fs::read("example.bin") to get a Vec<u8> of the data, but then you have to do quite a bit of "gymnastics" to convert always 8 of the bytes to a f64, i.e.

fn eight_bytes_to_array(barry: &[u8]) -> &[u8; 8] {
    barry.try_into().expect("slice with incorrect length")
}

let mut file_content = std::fs::read("example.bin").expect("Could not read file!");
let nr = eight_bytes_to_array(&file_content[0..8]);
let nr = f64::from_be_bytes(*nr_dp_per_spectrum);

I saw this post, but its from 2015 and a lot of changes have happend in Rust since then, so I was wondering if there is a better/faster way these days?

Sito
  • 494
  • 10
  • 29

2 Answers2

3

Example without proper error handling and checking for cases when file contains not divisible amount of bytes.

use std::fs::File;
use std::io::{BufReader, Read};

fn main() {
    // Using BufReader because files in std is unbuffered by default
    // And reading by 8 bytes is really bad idea.
    let mut input = BufReader::new(
        File::open("floats.bin")
        .expect("Failed to open file")
    );

    let mut floats = Vec::new();
    loop {
        use std::io::ErrorKind;
        // You may use 8 instead of `size_of` but size_of is less error-prone.
        let mut buffer = [0u8; std::mem::size_of::<f64>()];
        // Using read_exact because `read` may return less 
        // than 8 bytes even if there are bytes in the file.
        // This, however, prevents us from handling cases
        // when file size cannot be divided by 8.
        let res = input.read_exact(&mut buffer);
        match res {
            // We detect if we read until the end.
            // If there were some excess bytes after last read, they are lost.
            Err(error) if error.kind() == ErrorKind::UnexpectedEof => break,
            // Add more cases of errors you want to handle.
            _ => {}
        }
        // You should do better error-handling probably.
        // This simply panics.
        res.expect("Unexpected error during read");
        // Use `from_be_bytes` if numbers in file is big-endian
        let f = f64::from_le_bytes(buffer);
        floats.push(f);
    }
}

  • 1
    That match/ res handling is quite convoluted. – Netwave Dec 23 '21 at 20:17
  • THanks a lot for the answer. This seems like a very nice method if the file is not too big. – Sito Dec 23 '21 at 20:48
  • 1
    @Sito Why you think that it is wrong for big files? For small files it is easier to read whole file into byte buffer then parse bytes by chunks. My code should scale while there is memory for floats buffer but you may get rid of it quite easily. – Angelicos Phosphoros Dec 23 '21 at 22:19
  • I have a file that is ca. 20GB big.. Creating a buffer would in this case require quite a lot of memory, would it not? – Sito Dec 24 '21 at 00:11
  • 1
    @Sito A `BufReader` doesn't buffer the whole file, just enough that it doesn't have to fetch from the filesystem on each read call. Or if you mean that this code as written will read all the floats from the file, you can choose to stop reading whenever it is appropriate. – kmdreko Dec 24 '21 at 02:18
  • @Sito It buffers only small chunk of the file to make less syscalls. E.g. with unbuffered IO you would need `268_435_456` syscalls to read 2 GiB file of floats. Current default buffer size is 8 KiB so it would read file in chunks of 8 KiB, which result `262_144` syscalls. I recently read a decent article about a topic: https://era.co/blog/unbuffered-io-slows-rust-programs – Angelicos Phosphoros Dec 24 '21 at 10:35
  • @AngelicosPhosphoros That was a very nice article. Thanks a lot! – Sito Dec 24 '21 at 13:39
1

I would create a generic iterator that returns f64 for flexibility and reusability.

struct F64Reader<R: io::BufRead> {
    inner: R,
}

impl<R: io::BufRead> F64Reader<R> {
    pub fn new(inner: R) -> Self {
        Self{
            inner
        }
    }
}

impl<R: io::BufRead> Iterator for F64Reader<R> {
    type Item = f64;

    fn next(&mut self) -> Option<Self::Item> {
        let mut buff: [u8; 8] = [0;8];
        self.inner.read_exact(&mut buff).ok()?;
        Some(f64::from_be_bytes(buff))
    }
}

This means if the file is large, you can loop through the values without storing it all in memory

let input = fs::File::open("example.bin")?;
for f in F64Reader::new(io::BufReader::new(input)) {
    println!("{}", f)
}

Or if you want all the values you can collect them

let input = fs::File::open("example.bin")?;
let values : Vec<f64> = F64Reader::new(io::BufReader::new(input)).collect();
pigeonhands
  • 3,066
  • 15
  • 26
  • 1
    Maybe you should use [BufRead](https://doc.rust-lang.org/std/io/trait.BufRead.html) trait bound. Currently you still create unnecessary 8 KiB buffer even if user already has buffered object. With BufRead requirement, user wouldn't able to pass simple File object so API harder to use wrong while still being effecient. – Angelicos Phosphoros Dec 23 '21 at 20:28
  • @AngelicosPhosphoros Thanks for the suggestion! I didn't know that trait existed. – pigeonhands Dec 23 '21 at 20:42
  • Thank you for the answer! Very cool method. Just one question: In the line `let input = fs::File::open("example.bin")?;` my editor complains about the `?` opoerator... Handling the exception worked of course, but I'm not sure why the operator does not work in this case.. – Sito Dec 23 '21 at 20:48
  • 1
    @Sito Its short hand error checking in rust. You can replace it with `.unwrap()` to get it to compile but you should probably read through [Recoverable Errors in rust](https://doc.rust-lang.org/book/ch09-02-recoverable-errors-with-result.html) – pigeonhands Dec 23 '21 at 20:53
  • Thanks for the link in the book, but I'm still only at Chp. 6, so it might take me a while to get there and understand some things you did in your code. ^^ – Sito Dec 23 '21 at 20:59
  • 1
    @Sito tl;dr `?` de-sugars to `let input = match fs::File::open("example.bin") { Ok(r) => r, Err(e) => return e };` so the function that it is used in needs to return a `Result`. My main method is defined as `fn main() -> Result<(), Box>` – pigeonhands Dec 23 '21 at 21:04