3

I'm looking for something like Consistent Hashing, but with a guarantee that a distribution ends up as fair as possible (not just on average for random keys) - is there such a thing and where can I find it if so?

Edit: In my specific case, the set of keys is known up front (and "small"). Exactly these keys will always be present and must be allocated to exactly one node each at any given point in time.

SoftMemes
  • 5,602
  • 4
  • 32
  • 61
  • 3
    +1, seems a perfectly reasonable question to me: "Is there a known algorithm with all the properties of algorithm X, plus additional property Y?". Clearly the questioner has already done enough research alone to find algorithm X. – Steve Jessop Jun 12 '12 at 09:02
  • 1
    i don't even see any requirement for research in the faq - i think there's an imaginary faq that exists in the minds of people over-concerned about "homework" that is polluting everything else. some of us are very happy to be "research assistant" for interesting questions. – andrew cooke Jun 12 '12 at 13:07
  • I don't think what you're asking for is practically achievable. With n nodes, there are n^2 possible availability scenarios; I doubt it's possible to devise an algorithm that assigns responsibilities fairly under all those scenarios. – Nick Johnson Jun 14 '12 at 04:07
  • @Nick, doing it fairly and deterministically is not a problem, just round robin over the nodes and you're done. Doing it while still keeping the property that movements of keys should be minimal when nodes enter or leave, however, is a lot more difficult. – SoftMemes Jun 14 '12 at 11:51

2 Answers2

1

Sounds to me like you're looking for a minimal perfect hash.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • Not really. In my case, I do want collisions (multiple keys being allocated to the same nodes), and I have specific requirements for what should happen when the number of nodes changes. – SoftMemes Jun 13 '12 at 10:00
0

not just on average for random keys

This is not an accurate description of the guarantees provided by consistent hashing. First, "on average" does not capture the fact that, with random placement of a large number of virtual nodes on the circle and a good family of hash functions (e.g., one that is log-wise independent), the load imbalance is seriously unlikely to be large (I believe the usual imbalance should be on the order of square root of the number of keys assigned to a particular machine). Second, the keys don't have to be random as long as they don't depend on the randomly chosen hash function (oblivious adversary).

Since you want hashing that is fair always, randomization won't help, since the RNG might have outcomes indistinguishable from this one. No deterministic algorithm can assign node preferences to keys statically without the possibility of imbalance, unless the keys are known offline.

If you have sufficiently few items that you care about square root imbalances, you can do old-fashioned stateful load balancing.

magic
  • 11
  • 2
  • the variation (standard deviation) *in a bin* is the square root of the number of values *in that bin* (assuming "random" - poisson - stats). – andrew cooke Jun 12 '12 at 15:43
  • Agreed on the description of consistent hashing, however - in this case I do need even stronger guarantees, while I would still very much prefer not having explicit shared state (but being able to derive the allocation from only the bins and data). In my case, the set of keys is in fact known ahead of time, I should have included that in my original question (will update). – SoftMemes Jun 12 '12 at 16:23