1
use std::fs::File;
use std::io::Read;

fn main() {
    let mut f = File::open("binary_file_path").expect("no file found");
    let mut buf = vec![0u8;15000*707*4];
    f.read(&mut buf).expect("Something went berserk");
    let result: Vec<_> = buf.chunks(2).map(|chunk| i16::from_le_bytes([chunk[0],chunk[1]])).collect();
}

I want to read a binary file. The last line takes around 15s. I'd expect it to only take a fraction of a second. How can I optimise it?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Selva
  • 951
  • 7
  • 23
  • 1
    Did you compile in release mode? I.e. `cargo build --release` – Sven Marnach May 08 '20 at 12:25
  • And for clarification, the integers are stored as little endian in the file, and you want to convert them to the native endiannes of the system? Or can you be sure that they are stored in the native endianness in the file? – Sven Marnach May 08 '20 at 12:28
  • Thanks Sven for the quick response. The file is something generated already; I found out it is written in little endian based on trial and error ( I have a script in pascal to read this file). I am not sure about your native endian part. I just want to use it as int in my code – Selva May 08 '20 at 12:38
  • `cargo build --release` did make some considerable change. But not as good as my pascal – Selva May 08 '20 at 12:42
  • Hard to tell where the difference comes from. Your code looks like the compiler should be able to optimise it decently, but you can never tell without looking at the actual emitted code and measuring how fast it is. So how big is the difference to the Pascal code? – Sven Marnach May 08 '20 at 12:48
  • On my machine the debug version runs in 2.83s and release in 0.02s. How long does the release version take for you? – John Kugelman May 08 '20 at 12:49
  • I tried to measure the time in both Rust and Pascal, now it takes the same time. I don't know why. I also have one more issue now - I have 2 structs. Struct 1 has only values and Struct 2 has only references. Now I want struct 2 to share the struct 1 variables. Is it possible? – Selva May 08 '20 at 19:04
  • Post that as a new question. If this question is no longer reproducible you could delete it. – John Kugelman May 08 '20 at 19:08

2 Answers2

2

Your code looks like the compiler should be able to optimise it decently. Make sure that you compile it in release mode using cargo build --release. Converting 40MB of data to native endianness should only take a fraction of a second.

You can simplify the code and save some unnecessary copying by using the byeteorder crate. It defines an extension trait for all implementors of Read, which allows you to directly call read_i16_into() on the file object.

use byteorder::{LittleEndian, ReadBytesExt};
use std::fs::File;

let mut f = File::open("binary_file_path").expect("no file found");
let mut result = vec![0i16; 15000 * 707 * 2];
f.read_i16_into::<LittleEndian>(&mut result).unwrap();
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • What's slow about the OP's code and why is this faster? – John Kugelman May 08 '20 at 12:50
  • @JohnKugelman `read_i16_into()` reads from the file directly into the target buffer, and performs the endianness conversion in place, and only if necessary. So you need only a single allocation and no copying. – Sven Marnach May 08 '20 at 20:39
  • 1
    I actually benchmarked this with unsafe code that skipped the conversion - the speed was equivalent, and 3 times faster than the original solution - genuinely impressed this was as fast as the unsafe. – Richard Matheson May 08 '20 at 22:57
1

cargo build --release improved the performance

Selva
  • 951
  • 7
  • 23