0

I have 8gb file on server and I want to download 1.5 gb from this file using http multirange requests. I use curl.

All requests are distributed uniformly on file except first one, that contain big 500mb range (there are 161 requests in total).

I discovered, that download time for this first request with big range is ~40 sec, and total time ~560 sec. That means, that i download 500mb for 40 seconds, and 1gb for 520sec. So i have 6x slowdown for uniformly distributed requests. I also noticed, that download rate drops in ~6-8 times , when this uniformly distributed requests performs.

I don't understand, why this happens. Ranges in each request are sorted by offset increasing, so i don't get, why we can get such slowdown. Could you explain, what can cause such a mess? And moreover, how can i improve performance for such sets of request?

I could provide a set of requests and timing if needed.

brachistochron
  • 321
  • 1
  • 11

1 Answers1

1

You're not giving us much to work on, but you may want to check/consider the following points:

  • are you actually doing a single request with multiple ranges, or multiple requests, each with a separate range?

  • are you sure your server (and script, if it's a script) actually supports byte-range requests?

  • are you downloading from a static file, or something that is generated dynamically by the server? If the latter, consider that each request means the server probably needs to re-generate the full file before sending just the part you're interested in.

  • in any case, each request takes a little bit of time to establish (TCP connection, SSL/TLS handshake if appropriate, HTTP request) before the actual download. This is especially true if you use separate curl invocations or keep-alives are disabled

What is the reasoning for the multiple range requests? Are you sure it wouldn't just be faster/simpler to download the whole file (and possibly do some post-processing client-side)?

jcaron
  • 17,302
  • 6
  • 32
  • 46
  • 1) Yes, each of requests is request with multiple ranges. 2) I use newest apache server. As i understand it support such requests out of the box. 3) It is static file. Server can't change it. 4) Yes, i used separate curl invocation for testing. But as i undersand some overhead for initialization can't get to 6x slowdown. I can't download whole file because it should also work for network with small bandwidth. In my case first request (with big range) performs with rate around 13Mb/sec but drops to 2Mb/sec for rest of requests. – brachistochron Dec 29 '15 at 15:29
  • Can you be more explicit as to what exactly the different requests, and ranges of each are? You say multiple ranges, but then you say multiple requests... Can you provide specific examples of invocation (including an URL to download from, if possible)? – jcaron Dec 29 '15 at 15:40
  • Yep, i have multiple requests with multiple ranges. Here you can find archive with python script that run a bunch of requests (from requests.txt) https://www.dropbox.com/s/25jbzwswak8o98f/test_set.zip?dl=0 and write timing and memory for each request to stdout. So to reproduce that you could just start local apache server from official site and put some file (~2gb) to it. I also reduce requests to 2gb range, so for now i trying to download 1.5 gb from 2 gb file. – brachistochron Dec 30 '15 at 07:41