0

Description

I'm writing a Rust program that includes unarchiving a .tar.gz file. I followed the conventional approach, using crates tar and flate2:

// the function returns Result((), Box<dyn Error>)
let file = std::fs::File::open(some_path)?;
let decoder = flate2::read::GzDecoder::new(file);
let mut arc = tar::Archive::new(decoder);
for entry in arc.entries()? {
    let mut file = entry?; // <- where it errs
    // ...
}

I then downloaded the target file from a source, say http://example.com/file.tar.gz, and the program returned an error:

numeric field did not have utf-8 text: _����~�h when getting cksum for �

I searched the error message on the internet but none of the results looked like my case. I did remember to decompress the file using GzDecoder, and the file didn't seem corrupted – I could just double-click it and the system archive utility would unarchive it successfully. As I continued to find where it could go wrong, I was puzzled by the finding.

Problem

Initially, I downloaded the .tar.gz file using Firefox Browser. When I switched to Edge and cURL, my program raised no error. I compared the checksums of files downloaded via different methods; the Firefox one's was different while the other two had the same checksums.

I wonder how would the download medium be the source of problem. Even so, the system archive utility seemed undisturbed by the difference. I wonder if I can modify my code to avoid such problem, too.

Leonel Hou
  • 31
  • 3
  • 4
    Did you make sure that the file you read from disk is actually compressed? Depends on how you downloaded it, it might have been transparently decompressed so that you are dealing with a tar file only, not a tar.gz. Transparent decompression by the browser is often done when the server sends a `Content-Encoding: gzip` header. This is wrong when the file should be downloaded in compressed form by the client, but a common misconfiguration. – Steffen Ullrich Jul 20 '23 at 06:23
  • If you only tried this once, it might be that the archive has been corrupted during the download. It's pretty rare nowadays, but it's still a possibility. – jthulhu Jul 20 '23 at 07:24
  • @SteffenUllrich I did see the said header in the server response. However, the downloaded file doesn't seem entirely decompressed, as it still have gzip identifier bytes (according to [this](https://superuser.com/questions/115902/tell-if-a-gz-file-is-really-gzipped)). I ran `file` command on files obtained from Firefox and Edge, the latter got `gzip compressed data, max compression, original size modulo 2^32 56549888`, and the first `gzip compressed data, from Unix, original size modulo 2^32 21863538`. In short, Firefox seems to partly decompress the file during download. – Leonel Hou Jul 20 '23 at 07:24
  • It could also be decompression *after download* e.g. by default Safari automatically decompresses zip files after download. – Masklinn Jul 20 '23 at 08:10
  • @LeonelHou: would it be possible to provide the link in question to further debug what is happening? – Steffen Ullrich Jul 20 '23 at 08:11
  • @SteffenUllrich No problem, here is the [link](http://dice.weizaima.com/s/files/sealdice-core_1.2.6_darwin_amd64.tar.gz). It belongs to a project I'm contributing to. The site owner has said the website's API is not stable enough at the moment, though. – Leonel Hou Jul 20 '23 at 08:30
  • I've got once with Firefox a download which was additionally compressed, i.e. gzip on top of .tar.gz (essentially .tar.gz.gz). But when trying to debug the issue I got on later attempts the real file, without added compression. So maybe there was some bad caching in the CDN (the site is using a CDN). – Steffen Ullrich Jul 20 '23 at 12:17

0 Answers0