1

The title probably makes little sense, so here is an example.

I have a file hosting site, that serves a large amount of semi-randomly accessed files.

The setup is as follows:

  • High horsepower front-end +DB server that also does encoding for files that need encoding
  • Fresh file server, which stores newly uploaded content, thats probably (and usually) rapidly accessible, which has 500GB of raided SSD storage, that can push over 3GBit of traffic.
  • 3 cheap node servers, containing 2 x 750GB SATA drives in raid1, where files older than 2 weeks are archived, from the SSD server (mentioned above).

Files on each server are accessed via subdomains (via modsec) in a straight forward fashion (server1.domain.com, server2.domain.com, etc)

Where I have the problem is this. I introduced a "premium" service where people pay a small fee every month, and get ad-free, quick accesses to stuff on the site. Once they are logged in, they access same files via premium.server1.domain.com via a different modsec script, with a different pass phrase. That all works fine and dandy.... except the cheap node servers are all IO bound, so accessing the files on them via a different, unsaturated network makes no difference, since it cannot read off the drive fast enough.

What would be a good way to make files on the site be accessible via 2 different network routes, 1 of which will be saturated (the "free network") while all other files are on an un-saturated "premium" network?

1 Answers1

1

Hang on, your question pertains to something you already answered half-way through your explanation. Like you said, your problem isn't with saturation of the network adapters, you're limited by IO on the SATA drives. Or am I misreading?

Assuming that's right, you may be able to make some improvements depending on the access patterns for your older files. If you tend to get 'runs' on a single file at once (i.e. a link to the file is posted on a blog, then suddenly you're getting 500 unique IPs requesting the same file) then you should move that file into either a memory or pagefile cache, or stage it across to the SSD server before you serve it out.

A similar question was asked recently and I explored other possible solutions: Windows Server 2003 - Handling hundreds of simultaneous downloads

Chris Thorpe
  • 9,953
  • 23
  • 33
  • Well, what i thought I could do is either: put SAS drives instead of sata, so it could push ~600-700mbit instead of 250 with sata... or install a cache server between the SSD box, and the archive boxes, that would cache rapidly accessed files from the archive boxes. Except I dont know how to implement that. –  Mar 31 '10 at 04:52
  • What kind of hardware are the node servers running, and what drive controllers? SAS should net you a performance gain, but it seems to be more dependant on the controller than the drives. – Chris Thorpe Mar 31 '10 at 05:07
  • Its not the controller. Its 2 7200rpm SATA drives in raid1, using some rocketraid card. Completely random IO, sustaining about 250mbit with lots and lots of IO. –  Mar 31 '10 at 05:12
  • Are your files duplicated across the node servers, or spread? Sounds like you'd be better off with a spread where multiple clients hitting the same file would only be touching a single node server. – Chris Thorpe Mar 31 '10 at 05:53
  • Seems like the core issue is seek times, in which case you need to pursue caching solutions (such as increased RAM, changes in your web apps behavior, or staging to the SSD machines) rather than switching to SAS (which won't improve your seek significantly) or scaling out in terms of disk. – Chris Thorpe Mar 31 '10 at 06:07
  • 1
    Get proper hardware. 2x7200RPM on a low end card is not going to cut it. Upgrade to a proper SAS card and get in a proper storage subsystem - many discs, 10k RPM. WD has nice velociraptors, and Supermicro has a cse that can fit 23 of them into a 2 rack units high. You will be surprised of the IO load that can handle. – TomTom Jul 23 '10 at 23:55