12

hypothetically - if I have multiple memcached servers like this:

//PHP 
$MEMCACHE_SERVERS = array(
    "10.1.1.1", //web1
    "10.1.1.2", //web2
    "10.1.1.3", //web3 
); 
$memcache = new Memcache();
foreach($MEMCACHE_SERVERS as $server){
    $memcache->addServer ( $server ); 
}

And then I set data like this:

$huge_data_for_frong_page = 'some data blah blah blah';
$memcache->set("huge_data_for_frong_page", $huge_data_for_frong_page);

And then I retrieve data like this:

$huge_data_for_frong_page = $memcache->get("huge_data_for_frong_page");

When i would to retrieve this data from memcached servers - how would php memcached client know which server to query for this data? Or is memcached client going to query all memcached servers?

Stann
  • 13,518
  • 19
  • 65
  • 73
  • Thanks for asking this. So all in all, it seems like more write/read throughout is the objective and not redundancy? I'm looking for redundancy and at least confirms what I thought. – Till Sep 18 '11 at 15:39
  • [This question](http://stackoverflow.com/questions/4038094/using-multiple-memcache-servers-in-a-pool/4038108#4038108) isn't exactly the same, but my answer there should answer your question as well. – Harper Shelby Jan 17 '11 at 20:49
  • thanks...Do you know if memcached client balances write requests automatically? or does it go to the first server until it fills up and then to the second one untill second one fills up and the third one etc. – Stann Jan 17 '11 at 22:37
  • IIRC, there are multiple hashing strategies for memcache, but the default is a stable hashing algorithm. This means that a given key will (assuming the server configuration remains stable) always be on the same server. It also means that the hash used determines which server will get a given key, so that the load should be fairly balanced. – Harper Shelby Jan 17 '11 at 22:41

2 Answers2

14

Well you could write books about that but the basic principle is that there are some different approaches.

The most common and senseful approach for caching is sharding. Which means the data is stored only on one server and some method is used to determining which server this is. So it can be fetched from this very server and only one server is involved.

This obviously works well in key/value environments as memcached.

A common practice is to take a cryptographical hash of the key. Calculate this hash MOD number of servers and the result is the server you will store and fetch the data.

This procedure produces more or less equal balancing.

How it’s exactly done in memcached I don't know. But some sort of hash for sure.

But beware that this technique is not highly available. So if one server fails the entries are gone. So you obviously can only use this for caching purposes.

Other techniques, where for example high availability of resources is necessary, that take long to calculate and are automatically warmed up in the background, involve replication.

The most common form in caching environments is master-master replication with latest-timestamp conflict resolving. Which basically means every server gets the data from everyserver that is not yet on the local server (this is done using replication logs and byte offsets). If there is a conflict the latest version is used (the slight time offset between servers is ignored).

But in other environments where for examply only very little is written but a lot is read there is often a cascade where only one or few master servers are involved and the rest is just pure read replication.

But these setups are very rare because sharding as described above gives the best performance and in caching environments data loss is mostly tolerable. So it’s also default for memcached.

peterh
  • 11,875
  • 18
  • 85
  • 108
The Surrican
  • 29,118
  • 24
  • 122
  • 168
  • I've been looking for answer to this question for a little while now. Thanks for such a great response. Answered all my concerns. – Jamie Carl Aug 16 '12 at 05:45
0

some days ago i was looking for a solution to optimize scaling of our memcached servers and found this answer. From experiences we had made, the descriped solution with generating a hash and MOD number of servers to find the target-server, isn't the best one.

If you'll up- or downscale the number of your servers, it could likely result in the same scenario when flushing the cache. Most of the hashes get another server and so there won't be a result out of the cache for the first request.

The best solution to use for such scenarios is consistent hashing. With consistent hashing, every server gets a fixed hashrange. So if you now up- or downscale the number of servers, only the hashes in this particular hashrange will be switched to another server. All other hashes remains to there servers and only a little part will be regenerated.

For PHP there is a library called 'flexihash' which does the consistent hashing for you.

In our Blog, you can find an example how to use it with your own cache-client. The article is in german but the sourcecode should be selfexplained.

joocom
  • 1