I have numerous storage servers, and then more "cache" servers which are used to load balance downloads. At the moment I use RSYNC to copy the most popular files from the storage boxes to the cache boxes, then update the DB with the new server IDs, so my script can route the download requests to a random box which has the file.
I'm now looking at better ways to distribute the content, and wondering if it's possible to route requests to any box at random, and the download script then check if the file exists locally, if it doesn't, it would "get" the file contents from the remote storage box, and output the content in realtime to the browser, whilst keeping the file on the cache box so that the next time the same request is made, it can just serve the local copy, rather than connecting to the storage box again.
Hope that makes sense(!)
I've been playing around with RSYNC, wget and cURL commands, but I'm struggling to find a way to output the data to browser as it comes in.
I've also been reading up on reverse proxies with nginx, which sounds like the right route... but it still sounds like they require the entire file to be downloaded from the origin server to the cache server before it can output anything to the client(?) some of my files are 100GB+ and each server has a 1gbps bandwidth limit, so at best, it would take 100s to download a file of that size to the cache server before the client will see any data at all. There must be a way to "pipe" the data to the client as it streams?
Is what I'm trying to achieve possible?