Scaling large file downloads?

Question

We currently deliver large (1GB+) files via a single Apache server, but our Apache server is extremely disk-IO-bound and we need to scale.

My first idea was to simply duplicate this Apache server, however our file library is too big to simply horizontally scale the Apache server N-times.

So my next idea was to have two Apaches (highly-available) in the backend, each with a separate copy of our entire library.. then "N" reverse proxies in front, where "N" grows as our delivery needs grow. Each reverse proxy is very RAM heavy and has as many spindles per GB as possible. The backend Apache servers are more "archival" and low spindle-to-GB.

Is this a good architecture? Is there a better way to handle it?

How much data do you have? – theotherreceive Jul 15 '09 at 15:23 — theotherreceive, Jul 15 '09 at 15:23

score 6 · Answer 1 · answered Jul 15 '09 at 14:24

6

This is not a bad architecture (squid is a popular reverse proxy), however if you expect exponential growth a Content Delivery Network could be a better solution - you only pay for content that you require, bandwidth scales instantly (you don't have to scale out more servers) and streaming transfer is geolocated to servers as close to the client as possible, ensuring they get maximum transfer speeds. However I've never tried this with files of 1GB, and the cost may be prohibitive.

Torrent technology can be considered as a p2p CDN in this case, and as such some of these providers may be suitable as torrent seeds for your content, reducing your total bandwidth costs and (possibly) increasing speed, although that's dependent on your leechers.

answered Jul 15 '09 at 14:24

Andy

5,230
1
24
34

This is actually the other option we're considering. At the moment I'm trying to work out how the cost of scaling out ourselves (once we're sure on how to do it) compares to going with a CDN. Thanks for the CDN P2P links, I hadn't looked into those. – Jul 15 '09 at 23:52
++ for CDN ! It's a very nice way to scale. – Martin K. Jul 20 '09 at 19:30
1

For a good reason not to use squid from the varnish folks. http://varnish.projects.linpro.no/wiki/ArchitectNotes – reconbot Jul 20 '09 at 19:52

score 2 · Answer 2 · answered Jul 15 '09 at 13:59

2

If you aren't currently doing this, it may be worth investigating delivering the files optionally over bittorrent to push some of the load off of your servers and onto the P2P network.

answered Jul 15 '09 at 13:59

Thunder3

347
2
5

Incidentally, Amazon S3 has built-in BitTorrent serving. – ceejayoz Jul 15 '09 at 14:45

score 2 · Answer 3 · answered Jul 15 '09 at 14:56

My question is how do you know you're IO bound to begin with? It just strikes me as odd that your disks can't keep up with downloads over HTTP (assuming this is the case here and not HTTPS).

If you have a large user base then a CDN solution seems applicable, as others have pointed out. We use Akami for load distribution. Assumption here is that you're serving these files up over the PI (public Internet) vs. some internally hosted solution only on a 100Mb or 1000Mb switched network.

Is it possible you are perceiving slow downloads as a disk IO issue when it might be an Internet bandwidth issue instead? (again, assumption being this is a PI facing site).

There are so many ways to increase disk IO - you can use SAN or RAID; both provide some level of caching. I can't think of any Internet connection that would outstrip the capacity of a single SAN HBA or Dual SAN HBA (teamed) running at 2Gb/s/hba or local storage via RAID with Cache backing connected via a PCI-E bus.

Are we talking Gig-E connected clients to the same connected server?

The network is at about half capacity. I assume we're IOPS bound from looking at vmstat/iostat output. We can easily saturate the network with a smaller active file set but the problem is 90% of our files are "hot" at any given time. We have a lot of clients some are slow and the files are big. We've maxed the RAM on this box. There's room to improve the current disk-subsystem (making it more spindles per GB), but that's still scaling vertically and I'm honestly not sure what we could do after that, without changing architecture? — , Jul 15 '09 at 23:07
caching in my opinion is not going to solve a system that is heavy IO intensive unless you have tons of RAM to keep most of these "large" files in memory which seems unlikely. How large is "large?" "Large" to means implies GB size files requiring many disk flushes. What is your disk sub-system today? (RAID hardware, softwware, SAN, NAS, what?) Post your VMSTAT and IOSTAT dumps. If this is Linux, also post your HDPARM -t and -T output for each disk and possibly some FIO tests if you can. — Kilo, Jul 17 '09 at 18:19

score 1 · Answer 4 · answered Jul 15 '09 at 14:25

What you are trying to scale here is your IO.

Using a caching proxy like squid or varnish is a way to populate the cache to increase spindles without replicating low/none-used files in your archive. CDN devices do this for you too. Are these files media? CDN devices can do streaming for you as well.

Do users get file download failures and re-attempt a download often? A high retry rate will greatly increase your IO needs.

Do you have any control over how the files are fetched? a download manager can fetch each file in separate chunks, thereby splitting the request over several apaches over time (though they could also download in parallel, saturating your internet pipe).

As an 'experience' reference, I've only ever been in environments that place all that data onto a NAS (netapp in particular) and use apaches with NFS to deliver the files (though there were many smaller files, not 1GB ones). We also used a CDN as a caching proxy to stream video.

Can you recommend a particular CDN device, just to seed my search? These sound interesting. To answer your question, we don't have control over how the files are fetched. We also have no control over the client speed, some are very slow. — , Jul 15 '09 at 23:33
we used a cisco CDN device, but oddly they dont seem to be in that space anymore (this was circa 2002). See wikipedia for more references here http://en.wikipedia.org/wiki/Content_delivery_network It seems possible that Cisco just renamed their stuff CDS (content delivery system) instead? — ericslaw, Jul 16 '09 at 06:17

score 1 · Answer 5 · answered Jul 17 '09 at 02:32

One possible architecture I have seen uses nginx as the frontend and it is backed by multiple Varnish instances. There was also consideration of adding a second level primary varnish to that arch (i.e. varnish pulls from main varnish).

Other than that, you should consider use of a CDN as others have mentioned. Depending on what you are serving (media?), there are some specialized CDNs that focus more on delivering large files such as BitGravity.

score 1 · Answer 6 · answered Jul 20 '09 at 19:29

What storage systems do you have your data on? And can it be partitioned?

Having back end servers on physically different SANs each with a subset of the data, serving to reverse proxy front end machines would spit up your data physically and still let them be addressed logically the same from the outside.

Nginx is very good with memory consumption and static file serving as it can offload to the kernel using sendfile(). Lighttpd should also be looked at but I have heard it is less stable (re. memory consumption) but have not used it.

The front end servers can split requests on back end servers by path or patterns (Nginx has great regex support.) and can even redirect to different data centers if you had the need. DNS round robin might also be useful.

I've successfully used Nginx to reverse proxy as a failsafe between slowly syncing datasets. It would test for the file and if it didn't exist it would ask the back end server over http. Later it would be synced to the front end machines and would work normally.

Make sure whatever you do you monitor the stats across the board. The memory, io waits, bandwidth, latency, average request times, etc. Without monitoring whatever you do is shooting in the dark.

score 1 · Answer 7 · answered Jul 21 '09 at 08:18

What we do is use MogileFS to redundantly store our files (redundancy and scalability by having each file on multiple servers), but have user accesses go through a CDN for speed and ... well, more scalability.

We use a smaller CDN, PantherExpress -- their pricing is good and the feature set is just great. Limelight Networks and EdgeCast also gave us good price quotes when we were shopping around.

I liked that PantherExpress give you good technical documentation on their features and you get all the features they have for one price rather than a little extra money for this and some more extra money for that.

Scaling large file downloads?

7 Answers7