Getting file size before download on a proxy server

Question

We're working on building a multithreaded proxy file server in C, where we receive a request and retrieve it from another location using the libcurl library.

The library gives you the option of issuing a HEAD request to get some file parameters - such as size etc.

You can also get these parameters when you actually start serving the file.

A colleague has pointed out that doing a HEAD request and then immediately getting the file afterwards is wasteful. I agree with him, but I was wondering if there exists any use case where it might be useful to know the file size in advance?

e.g. Choosing optimal MTU.
e.g. Setting thread priority if it is a big file.
e.g. Reducing overall threads in case we use up too much memory when we have a big file.

In addition to this are there security concerns when we query a file size before retrieving it in the proxy file server scenario?

One very important thing to keep in mind is that you have no guarantee the file size will remain the same between the first request and the second request. So, once you've extracted the important information from the HEAD, you'll need to verify the information is actually the same for the GET. If they modify a resource in between requests, or they're using a load balancer and there's propagation delay for resource versioning between backend machines, that's a nontrivial possibility. — Parthian Shot, Mar 11 '16 at 20:13
Since you're writing the server in C, if you rely on the length value being the same between requests, or if you rely on the length value being correct at any point, there's a risk of buffer overrun and arbitrary code execution if you hit the situation above, or if you find a server that's buggy / malicious. You'll want to look up- on SO, not here- questions relating to writing / hardening secure C (i.e. compiling with support for stack canaries and ASLR, bounds checking, never passing untrusted data to printf directly, always passing string lengths along with the pointer). — Parthian Shot, Mar 11 '16 at 20:18
MTU settings is a property of the network. There is no connection between the MTU setting and individual requests. — kasperd, Mar 12 '16 at 09:53

HBruijn · Accepted Answer · 2016-03-11T08:54:35.187

2

The RFC says it all already:

9.4 HEAD

The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself.
This method is often used for testing hypertext links for validity, accessibility, and recent modification.

I would expect a proxy to primarily use the HEAD request to determine if an already cached object is still valid and may be returned instead of initiating a new download.

edited Mar 11 '16 at 08:54

answered Mar 11 '16 at 06:56

HBruijn

77,029
24
135
201

That's perfect, so there is certainly use here. – user1658296 Mar 11 '16 at 06:57
1

Also relevant: [ETags](https://en.wikipedia.org/wiki/HTTP_ETag). – Parthian Shot Mar 11 '16 at 20:19

Getting file size before download on a proxy server

1 Answers1

9.4 HEAD