0

I have some data that I want to store in memcached (using the PHP libmemcached client: https://www.php.net/manual/en/intro.memcached.php). It's something that's hit extremely frequently across my web app.

In order to reduce the amount of traffic to a single memcached node I append a random number between 1 and 10 to the end of the key, in hopes that the client will not store all of the keys on a single node.

I had assumed that the process of assigning a key was random, but across 15 nodes at least half of the keys went to the same node. That makes me think there is something more deterministic about how it decides which node to use for a given key.

Does anyone know how it's done?

Zachary Wright
  • 23,480
  • 10
  • 42
  • 56

1 Answers1

1

It uses a hash. In the simplest form, imagine if you run a hash function like MD5 on a key, you can use the first byte to figure out which server it should go to.

This is important, because if 2 servers talk to multiple memcached servers they need to reliable all select the same server for the same key. Random would be bad because it means that a client might try to get() from a different server where the item was stored.

If you have 15 nodes and more than half of the items were stored in 1 node, you are either: 1. Extremely unlucky or 2. Something is not configured right and some of your servers are marked offline.

The underlying hash is more complex than a simple 'md5', it uses a 'consistent hashing' algorithm. What this means is if you have 15 nodes, and lose 1, most keys will still resolve to the same server. There's lengthy articles about 'consistent hashing' so it should be easy to get technical details.

Evert
  • 93,428
  • 18
  • 118
  • 189
  • Ah, ok. That is what I meant by "random" (ie, not alphabetical or something else like that that would cause similar names to be grouped together.) I can take a look into our configuration, or maybe just assume I need a bigger spread of keys to reduce the likelihood of them grouping up. – Zachary Wright Jun 03 '20 at 22:44
  • 1
    @ZacharyWright the hash it uses should have good distribution. Even if the keys are 99% identical, the spread should be statistically even. – Evert Jun 03 '20 at 22:50