2

I want to build a file server that serves ~50 TB of content to its users. To maximize the server's throughput, I'm going to utilize the follow scenario.

  • 50 TB of HDD storage. All of the static files are residing here.
  • 6 TB of SSD storage. This will act as a cache for most popular contents.
  • A cache manager that decides what should reside on HDD or SSD.

Based on this architecture, the most popular files are copied to the SSD drives and served from there. The cache manager is a customized software, designed based on my application characteristics.

I had a few questions regarding this plan.

  1. Should I be worried about SSD write limits?
  2. Is there any cache framework that I can use to write my special-purpose cache manager, based on my own rules?
Sadjad
  • 123
  • 1
  • 4
  • 2
    As for write limit I don't think this should be of any concert. You should still check the drive endurance for the particular SSDs you are planning to use. But if you're not choice limited, then you could always use decent enterprise grade SSDs which will be fit for that sort of abuse (I.E. Ultrastar SSD800MH is rated for 25DW/D which is 25 full disk writes per day which is way more than most use cases). – Kęstutis Jun 12 '15 at 11:08

2 Answers2

2

Don't reinvent the wheel. Use ZFS.

But you have other architectural concerns like networking, tuning, your client systems. It may also be helpful to describe the context for this and what you currently have in place.

ewwhite
  • 197,159
  • 92
  • 443
  • 809
  • Thanks, @ewwhite! I'm new to ZFS, so could you give me some pointers to begin with? – Sadjad Jun 12 '15 at 11:13
  • 1
    @user44635 I think [ZFS cache: ARC (L1), L2ARC, ZIL](https://en.wikipedia.org/wiki/ZFS#ZFS_cache:_ARC_.28L1.29.2C_L2ARC.2C_ZIL) should give you some basic understanding (L2ARC is probably what you want) and from there, there's plenty of good reads on the internet. – Kęstutis Jun 12 '15 at 11:17
  • 1
    See this article on [ZFS caching](https://pthree.org/2012/12/07/zfs-administration-part-iv-the-adjustable-replacement-cache/). – ewwhite Jun 12 '15 at 11:18
2

Although ZFS can do it, like ewwhite says, another solution might be bcache. I'm using that in a totally different scenario (2TB HDD and 128GB SSD in my laptop, using bcache makes loading Civ V a lot nicer ;-)), but it works very nicely.

Depending on how you serve the files, you might also want to consider something like Varnish, which you setup to use the SSD as the cache store.

Regarding using your own rules, don't do that. Lots of smart people have worked on this problem, you want to stand on their shoulders, IMHO.

Depending on how often you expect the most used content to change, I wouldn't worry about SSD write performance. Or put your SSDs in a RAID10 array to get even more performance out of them. Also, add a lot of RAM, so files can be cached in the kernel in memory block cache as well.

This all assumes a Linux machine, I guess.

Tim Stoop
  • 588
  • 5
  • 20
  • So now, I'm torn! What are some advantages/disadvantages of using something like bcache, or even a solution like Varnish, over ZFS? – Sadjad Jun 12 '15 at 13:11
  • I think personal preference is the biggest difference. And how you want to use it. Varnish does HTTP so it'll only work if you do HTTP. But it's a really good cache. ZFS is not part of the vanilla kernel, so you'll need to either find a precompiled kernel with ZFS for your distro or build your own kernel. Bcache is part of the vanilla kernel, so you only need to install the required tools. Which you like better depends on how you will implement the file sharing and personal preference. – Tim Stoop Jun 12 '15 at 13:34