0

I am trying to understand distributed cache in-depth.

Say, I have 1, 2 & 3 distibuted caches.

Say process one, p1, tries to write key "K1" and value "Value1" to the cache. Based on the key, the algorithm determines which cache to write to, and K1 writes to 1. Is it possible that the read request on K1 can go to partition number 2 or 3. Or ideally, for partitioned caching to work correctly, request to read, write and update for a Key should always to go a particular partition( in this case, for K1, all requests should always go to partition 1?)

Nishant
  • 1
  • 3

1 Answers1

0

Depends on the distributed cache service and configured mode of operation

Distribution aware clients with Server's configured in standalone mode

  1. Client are the distribution aware agent in this configuration
  2. Client is initialized with a list of server endpoints
  3. Clients are initialized with a hashing strategy (preferably same hashing strategy across all clients to effectively retrieve a key set by another client)
  4. Server just acts as a key/value
  5. To store a key, value pair, client will hash(as per the strategy) the key and will forward the request to the corresponding server to store
  6. If the server is unavailable, the client can choose a fallback hash strategy to select a different server (this can be tried on till the last server). In this case, the reconciliation of values on different servers can lead to data inconsistency in case of network partitions.
  7. Or if the server is unavailable, the client can simple not store in cache and return error

From a setup perspective, this can be easy and simple, but from a scaling and debugging perspective this will be slightly difficult.

Server in cluster mode and client as a just a proxy

  1. Servers are the distribution aware agent in this configuration
  2. Servers are setup as quoram and each server knows about all the other servers
  3. Servers are initialized with some consistent hashing strategy to handle load and effective recovery incase of a node failure
  4. Every server knows the partition of keys allocated to every other server and hence can forward the request
  5. Clients are configured with a set of servers
  6. Client can make a call to any server and the server cluster takes care of delegating the request to correct server and return the response to client

There are variants of this combination that can mix the distribution awareness in both client and server. But its generally preferred to keep the logic on one side to enable efficient debugging in case of issues

Consensus

Alternatively, if you are looking for consensus system with low volume of data(can have high reads and low writes), then please look onto

  1. ZAB based design (zookeeper)
  2. Raft based design (etcd)
  3. Paxos based design (Google's distributed consensus systems may be based on Paxos)
Thiyanesh
  • 2,360
  • 1
  • 4
  • 11