5

Currently I am running a production environment with 4 dedicated memcached servers, each of them having 48Gb of RAM (42 dedicated to memcache). Right now they are doing fine, but traffic and content are growing and will surely be growing next year too.

What are your thoughts on strategies for scaling memcached further? How have you done until now:

Do you add more RAM to the boxes until their full capacity - effectively doubling the cache pool on the same number of boxes? Or do you scale horizontally by adding more of the same boxes, with the same amount of RAM.

The current boxes can surely handle more RAM as their CPU load is quite low, the only bottleneck being memory, but I wonder if it wouldn't be a better strategy to distribute the cache, making things more redundant and minimizing the impact on the cache of losing one box (losing 48Gb of cache versus losing 96Gb). How would you (or have you) handle this decision.

Kenny Rasschaert
  • 9,045
  • 3
  • 42
  • 58
danakim
  • 392
  • 2
  • 9
  • are you stuck on memcache or will you be open to other options? – Mike Sep 29 '11 at 12:38
  • Stuck to memcache. It has served us well up until now so there is no reason in changing. And I am pretty sure it can scale quite nicely, the question is which would be the best way. – danakim Sep 29 '11 at 14:55

2 Answers2

2

When I've done this there is usually a break-even between point box size (rack space cost), expense of high density chips and failure scenario handling. This almost always ends up with a configuration less than the maximum memory density (as well as usually not the fastest chips available), which as you mentioned improves impact of node failure and usually makes them more cost effective. Some costs/things to consider when making this choice:

  • node cost (cpu/mem/etc)
  • rack space cost
  • administrative overhead/cost
  • failure scenarios (are you trying to do N+1?)

I have also done upgrades to max out boxes as you grow clusters too (usually when they are pretty small), as it may be significantly cheaper in the short term to buy some more memory as you scale to give you more time to make larger architectural decisions.

polynomial
  • 4,016
  • 14
  • 24
  • Thanks a lot for the advice! I think I might go for just scaling horizontally as the high density memory chips are quite expensive and the impact of one node failure is quite high. I would rather spread this around. – danakim Oct 03 '11 at 11:27
1

I so want to know what it is you're moving that consumes over 100 GB of memory while not maxing out your NICs.

Memcache scales fairly linearly between machines, so the questions you have to ask are:

  • Is my system bus currently saturated?
    • This might not relate to CPU usage -- DMA transfers won't show that way
  • How expensive is the high-density memory versus a new box containing the increase amount of memory?
    • Full cost of rack space, power consumption, etc.
  • Do you see a fundamental difference between losing 25% of your cache 1% of the time and 12.5% of your cache 2% of the time? (Randomly chosen failure rate).

Scaling is 10% intuition, 70% measuring and adapting, and 20% going back and trying something else.

Load 'em up until they max out the weakest link or stop being cost-effective. They may or may not already be there.

Jeff Ferland
  • 20,547
  • 2
  • 62
  • 85
  • Thanks a lot for the advice! As I told polynomial above, I am going to go for more boxes instead of more memory. Going for 96gb of RAM is quite expensive and looking at the impact of a node failure on the application, I would like to minimize that. And regarding your question: each box has 48gb of ram and a gigabit link. I am maxing out their connection at about 150Mbps - so there is room to grow the RAM, at least in terms of network bandwidth. – danakim Oct 03 '11 at 11:34