1

Take a look at this list http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_parallel_fault-tolerant_file_systems

and which is the best choice for storing large number of static media files (with normal file size: 10KB-2M for images, and 5M-500M for video files) of a website? (among these distributed parallel fault tolerant file systems)

Mickey Shine
  • 939
  • 4
  • 17
  • 33
  • What kind of media files (30Gb high-def mkvs and 3kb GIFs are not equal)? What kind of traffic (hits per second, unique IPs per second, % cache hits)? What kind of web server and base OS? – Chris Thorpe Mar 30 '10 at 07:05
  • media files are normal size, images: 10KB-2M, video video files:5M-500M. They are normal media files – Mickey Shine Mar 30 '10 at 07:20

3 Answers3

3

Sorry to give a non-answer, but what you're asking raises an eyebrow. If you really need a clustered file system, I would expect some kind of explanation as to why.

Answer: None of the above -- go with a "filesystem" that isn't a POSIX compliant file system, but rather a HTTP service oriented towards file storage, replication and redundancy. Examples include MogileFS (self-hosted, originally created by Brad Fitzpatrick) or Amazon Simple Storage Service (hosted service), or Windows Azure's blobs (.NET, hosted service).

Based on your previous questions you appear to be just starting out, i.e. greenfield development. If so, then generally speaking you're better off purchasing file storage as a service at market price, rather than trying to build something yourself. Getting availability and replication right on a large scale is hard.

  • "better off" is disputable - under constant high load cloud hosting is terribly expensive. It does scale well, though ;) – TomTom May 02 '10 at 10:43
  • @TomTom: I don't disagree, cloud can get very expensive, and the best solution will depend on specific constraints in each case. But a) you're talking about CPU load(?), I'm talking about file storage; and b) in *most* cases I do not think one can build a do-it-yourself solution that will match cloud computing file store reliability and scalability at a similar price point, when all expenses (capital costs, labour costs) are factored in. –  May 02 '10 at 15:33
  • You can, by SIGNIFICANT margins. Heck, even a normal rented server is a lot cheaper. File storage gets really funny if you understand you can get a configured SAN rented for a lot more. All prices USD. Azure: 15 cents / gb / month. For 10.000 gb that is 1500 USD per month storage- bad news, I can build a server handling this for about 6 months rent, EFFICIENTLY, not using large discs. Probably lease cost around 400 USD. Note data goes on top of that. Azure is about 15 cents out. I pay about 15 USD / mbit. GREAT for spikes etc., but sucks for large significant 24/7 load. – TomTom May 02 '10 at 15:41
  • Every hoster beats the hell out with normal server leases - but then, they don't scale as well and require a lot of commitment. – TomTom May 02 '10 at 15:43
  • @TomTom: It is not an apples-to-apples comparison. I'm not continuing this beyond this comment; we're getting far away from OPs original question. –  May 02 '10 at 20:34
0

I agree with Jesper on this one, personally I'd use a pair of Zeus ZXTM's as coherent caches in front of a pair of standard http servers serving from a single shared NFS/CIFS mount.

That said if you want to do it like this I've had a lot of good experiences with Ibrix myself.

Chopper3
  • 101,299
  • 9
  • 108
  • 239
-1

GlusterFS has a very good reputation, but may be overkill for your application. Have you considered distributing the files on server pairs, e.g.

2 ips for static1.yourdomain.com 2 ips for static2.yourdomain.com ...

Always two servers for redundancy. Should any static group get too much traffic, add more servers to that group for scaling.

korkman
  • 1,657
  • 2
  • 13
  • 26
  • -1.... DNS round robin is not high availability (due to DNS caching - basically, a dead server will still get hit). THe idea is sound, the implementation sucks. Rather use static.youdomain.com and use a load balancer (soft or hardware) to route to available hosts. – TomTom May 02 '10 at 07:57
  • That said, static-yourdomain.com is better as it makes sure that cookies are not transmitted even if the programmer forgets about it ;) – TomTom May 02 '10 at 07:58
  • Correct. I only mentioned this method as a simple, "mostly available" alternative because high availability is possibly not a requirement from OP. – korkman May 03 '10 at 18:47