3

I am calculating the checksum of a file using something like this:

pub fn sha256_digest(file_path: &str) -> Result<String, Box<dyn Error>> {
    let file = fs::File::open(file_path)?;
    let mut reader = BufReader::new(file);
    let mut context = Context::new(&SHA256);

    loop {
        let consummed = {
            let buffer = reader.fill_buf()?;
            if buffer.is_empty() {
                break;
            }
            context.update(buffer);
            buffer.len()
        };
        reader.consume(consummed);
    }

    let digest = context.finish();

    Ok(write_hex_bytes(digest.as_ref()))
}

Checksum:

dca3b9746da896f05072bdec6b788513029b26ab453b82e2e9d4365e56e2c913
Elapsed: 226.14ms

I tried already this with tokio:

async fn sha256_digest(file_path: &str) -> Result<String, Box<dyn Error>> {
    let file = File::open(file_path).await?;
    let mut stream = FramedRead::new(file, BytesCodec::new());
    let mut context = Context::new(&SHA256);
    while let Some(bytes) = stream.try_next().await? {
        context.update(&bytes);
    }
    let digest = context.finish();
    Ok(write_hex_bytes(digest.as_ref()))
}

But unfortunately, it takes like twice the time to calculate the checksums, therefore I would like to try an option with tokio::io::BufReader the problem that I am having is to implement the Trait AsyncBufRead so that I could use the consume method.

Checksum:

dca3b9746da896f05072bdec6b788513029b26ab453b82e2e9d4365e56e2c913
Elapsed: 448.25ms

I am trying:

async fn sha256_digest_BufReader(file_path: &str) -> Result<String, Box<dyn Error>> {
    let file = File::open(file_path).await?;
    let mut reader = BufReader::new(file);
    let mut context = Context::new(&SHA256);
    loop {
        let consummed = {
            let buffer = reader.buffer();
            if buffer.is_empty() {
                break;
            }
            context.update(buffer);
            buffer.len()
        };
        reader.consume(consummed);
    }
    let digest = context.finish();
    Ok(write_hex_bytes(digest.as_ref()))
}

But getting this error:

error[E0599]: no method named `consume` found for struct `tokio::io::util::buf_reader::BufReader<tokio::fs::file::File>` in the current scope
  --> src/main.rs:63:16
   |
63 |         reader.consume(consummed);
   |                ^^^^^^^ method not found in `tokio::io::util::buf_reader::BufReader<tokio::fs::file::File>`

warning: unused import: `tokio::io::AsyncBufRead`

Any idea of how could this be implemented?

Update: I am trying this:

async fn sha256_digest_BufReader(file_path: &str) -> Result<String, Box<dyn Error>> {
    let file = File::open(file_path).await?;
    let mut reader = BufReader::new(file);
    let mut context = Context::new(&SHA256);
    let mut buf: [u8; 8192] = [0; 8192]; //chunk size (8K, 65536, etc)

    while let Ok(size) = reader.read(&mut buf[..]).await {
        if size == 0 {
            break;
        }
        context.update(&buf[0..size]);
    }
    let digest = context.finish();
    Ok(write_hex_bytes(digest.as_ref()))
}

But still running "slow" or at least twice the time even changing the chunk size like for example testing with 65536, 131072.

Ömer Erden
  • 7,680
  • 5
  • 36
  • 45
nbari
  • 25,603
  • 10
  • 76
  • 131
  • `tokio::io::BufReader` already implements `AsyncBufRead` and it does your [consuming requirement implicitly](https://docs.rs/tokio/0.2.22/src/tokio/io/util/buf_reader.rs.html#109-133) via `AsyncRead`, you can use `AsyncReadExt` to `read` bytes, default buf size for `fill_buf ` is declared in here: https://docs.rs/tokio/0.2.22/src/tokio/io/util/mod.rs.html#79 – Ömer Erden Aug 17 '20 at 06:39
  • It can't find `consume` method because `consume` expects self as `Pin<&mut Self>`. – Ömer Erden Aug 17 '20 at 06:40
  • @ÖmerErden could you please help me better understand how and what is a `Pin<&mut Self>` I am trying to grasp the idea but still not getting it – nbari Aug 17 '20 at 07:23
  • I don't think i can do it in a single comment, you can read [this](https://rust-lang.github.io/async-book/04_pinning/01_chapter.html), but that's not really related with your case. It is better if you stick to my first comment. wrap tokio file with `let reader = tokio::io::BufReader(file)` then use the reader to read chunks. – Ömer Erden Aug 17 '20 at 07:47
  • I am indeed currently stuck trying to read in chunks, mainly trying to find how to loop and read until `buffer.is_empty()` but within `tokio`, for example: `while let Ok(bytes) = reader.read_to_end(&mut buffer).await { do someting}` but it never ends – nbari Aug 17 '20 at 08:00
  • 1
    It never ends because you are calling `read_to_end` continuously, `read_to_end` will read whole file at once then it will read 0 bytes after each call, you need to read chunks with read, please check: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d8fbd1627f627443ecc76e09a9817f8a – Ömer Erden Aug 17 '20 at 09:09
  • Thanks for the example, but now I think I need to flush the buffer in every iteration so that I could calculate the checksum of the whole file `context.update(buffer);` or find something that could do something similar to `consume` – nbari Aug 17 '20 at 09:18
  • You don't need to, just flush the buffer via context.update, on every read call(iteration) `BufReader` marks the buffer(chunk) as consumed implicitly. – Ömer Erden Aug 17 '20 at 10:13
  • 1
    try using `buf[..bytes]` to get actual chunks – Ömer Erden Aug 17 '20 at 10:52
  • for the speed, this might effect, your first example uses chunk size as `8 * 1024`. I wrote `2048` as a placeholder, ref: https://github.com/rust-lang/rust/blob/master/library/std/src/sys_common/io.rs#L1, – Ömer Erden Aug 17 '20 at 11:02
  • hi @ÖmerErden thanks for all the help, interesting that I have been using chunk size up to 65536 but still don't know why the "sync" version is twice faster – nbari Aug 17 '20 at 11:07

0 Answers0