17

I'd like to know some kind of file checksum (like SHA-256 hash, or anything else) when I start downloading file from HTTP server. It could be transferred as one of HTTP response headers.

HTTP etag is something similar, but it's used only for invalidating browser cache and, from what I've noticed, every site is calculating it in different way and it doesn't look like any hash I know.

Some software download sites provide various file checksums as separate files to download (for example, latest Ubuntu 16.04 SHA1 hashes: http://releases.ubuntu.com/16.04/SHA1SUMS). Won't it be easier to just include them in HTTP response header and force browser to calculate it when download ends (and do not force user to do it manually).

I guess that whole HTTP-based Internet is working, because we're using TCP protocol, which is reliable and ensures received bytes are exactly same as one send by the server. But if TCP is so "cool", why do we check file hashes manually (see abouve Ubuntu example)? And lot of thing can go wrong during file download (client/server disk corruption, file modification on server side etc.). And if I'm right, everything could be fixed simply by passing file hash at download start.

Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
Greg Witczak
  • 1,634
  • 4
  • 27
  • 56
  • A [content-addressing](https://en.wikipedia.org/wiki/Content-addressable_storage) system would retrieve each file's hash before downloading it, to prevent unnecessary downloads of files that were already cached. – Anderson Green May 10 '23 at 22:44

3 Answers3

9

Digest is the standard header used to convey the checksum of a selected representation of a resource (that is, the payload body).

An example response with digest.

>200 OK
>...
>Digest: sha-256=X48E9qOokqqrvdts8nOJRJN3OWDUoyWxBf7kbu9DBPE=
>
>{"hello": "world"}

Digest may be used both in request and responses. It's a good practice to validate the data against the digest before processing it.

You can see the related page on mozilla website for an indepth discussion around the payload body in http.

I guess that whole HTTP-based Internet is working, because we're using TCP protocol

No, the integrity on the web is ensured by TLS. Non-TLS communication should not be trusted. See rfc8446

Community
  • 1
  • 1
Roberto Polli
  • 701
  • 7
  • 10
6

The checksum provided separately from the file is used for integrity check when doing Non TLS or indirect transfer.

Maybe I know your doubt because I had the same question about the checksums, let's find it out.

There are two tasks to be considered:

  1. File broken during transfer
  2. File be changed by hacker

And three protocol in this question:

  1. HTTP protocol
  2. SSL/TLS protocol
  3. TCP protocol

Now we separate into two situations:

1. File provider and client transfer the file directly, no proxy, no offline(usb disk).

The TCP protocol promise: the data from server is exactly same as the data client received, by checksum and ack.

The TLS protocol promise: the server is authenticated (is truly ubuntu.com) and the data is not changed by any middleman.

So there is no need to add checksum header in HTTP protocol when doing HTTPS.

But when TLS is not enabled, forgery could happen: bad guy in middle gives a bad file to the client.

2. File provider and client transfer the file indirectly, by CDN, by mirror, by offline way(usb disk).

Many sites like ubuntu.com use 3-party CDN to serve static files, which the CDN server is not managed by ubuntu.com. http://releases.ubuntu.com/somefile.iso redirect to http://59.80.44.45/somefile.iso.

Now the checksum must be provided out-of-band because it is not authenticated we don't trust the connection. So checksum header in HTTP protocol is helpless in this situation.

tianzhipeng
  • 2,149
  • 1
  • 13
  • 17
0

The hashes on ubuntu.com and similar sites are there for two purposes:

  • check the integrity of the file (yes, hypothetically the browser could check it for you)
  • check the correctness of the file, to avoid tampering (e.g. an attacker could intercept your download request and serve you a malicious file. While you may be covered by https browser side, that would not be true for data at rest, e.g. a usb external disk, and you may want to check for its correctness by comparing the hashes)
Riccardo Galli
  • 12,419
  • 6
  • 64
  • 62
  • 1
    The question is why the checksum isn't delivered in the same HTTP response as the data. – CodeCaster Jan 26 '17 at 19:05
  • The part about SHA1SUMS in the question implies that we could just remove those kind of files, which leaded me to talk about security. Probably I should have just write a comment instead :-/ – Riccardo Galli Jan 26 '17 at 22:05