1

My use case is as follow:

We have about 500 servers running in an autoscaling EC2 cluster that need to access the same configuration data (layed out in a key/value fashion) several million times per second.

The configuration data isn't very large (1 or 2 GBs) and doesn't change much (a few dozen updates/deletes/inserts per minute during peak time).

Latency is critical for us, so the data needs to be replicated and kept in memory on every single instance running our application.

Eventual consistency is fine. However we need to make sure that every update will be propagated at some point. (knowing that the servers can be shutdown at any time) The update propagation across the servers should be reliable and easy to setup (we can't have static IPs for our servers, or we don't wanna go the route of "faking" multicast on AWS etc...)

Here are the solutions we've explored in the past:

  • Using regular java maps and use our custom built system to propagate updates across the cluster. (obviously, it doesn't scale that well)
  • Using EhCache and its replication feature. But setting it up on EC2 is very painful and somehow unreliable.

Here are the solutions we're thinking of trying out:

I would like to know if each of those solutions would work for our use case. And eventually, what issues I'm likely to face with each of them.

Here is what I found so far:

  • Hazelcast's Replicated Map is somehow recent and still a bit unreliable (async updates can be lost in case of scaling down)
  • It seems like Geode became "stable" fairly recently (even though it's supposedly in development since the early 2000s)
  • Ignite looks like it could be a good fit, but I'm not too sure how their S3 discovery based system will work out if we keep adding / removing node regularly.

Thanks!

Maxime
  • 465
  • 4
  • 15
  • Geode is the opensource version of Gemfire. and Gemfire has been around for quite a while. When you are doing research, it might be helpful to search for Gemfire related discussion as well since for the basics, Gemfire and Geode works pretty much the same. – Xiawei Zhang Feb 15 '17 at 03:03

3 Answers3

3

Geode should work for your use case. You should be able to use a Geode Replicated region on each node. You can choose to do synchronous OR asynchronous replication. In case of failures, the replicated region gets an initial copy of the data from an existing member in the system, while making sure that no in-flight operations are lost.

In terms of configuration, you will have to start a couple/few member discovery processes (Geode locators) and point each member to these locators. (We recommend that you start one locator/AZ and use 3 AZs to protect against network partitioning).

Geode/GemFire has been stable for a while; powering low latency high scalability requirements for reservation systems at Indian and Chinese railways among other users for a very long time.

Disclosure: I am a committer on Geode.

Swapnil
  • 1,191
  • 7
  • 8
  • If I opt for synchronous replication, will I have to wait until the update is propagated to the 500 servers? Or will it just make sure that it's replicated on at least one/two/three servers? You also mentioned that the discover happens using discovery processes. Does it work across AWS Regions as well? – Maxime Feb 15 '17 at 02:06
  • I recommend Swapnil's solution along with the CQRS architectural pattern to handle your updates via eventual consistency. I contributed this solution in Vaughn's book "Implementing Domain-Driven Design" several years ago with GemFire. Your millions of reads/sec is from your read-only cluster across your nodes and is optimized for reads. A smaller cluster is for your writes and optimized for your write aggregates. Geode's natural guaranteed asynch-eventing mechanism will carry the updates across to your read cluster to update. Geode has the advantage here with guaranteed asynch events and CQ. – Wes Williams Feb 16 '17 at 18:03
  • As for the discovery process, yes, it will work across AWS Regions as long as the IP is visible. The locators exist for high scalability, ease of management and protection against split brain. Geode will guarantee that replicas are in different AZ's for consistency protection. If your latency is high between the regions then you can opt for Geode's WAN Gateway to guarantee updates. The WAN Gateway is and has been used by nearly every major bank in the US for many years to replicate. See https://geode.apache.org/docs/guide/topologies_and_comm/multi_site_configuration/multisite_topologies.html – Wes Williams Feb 16 '17 at 18:24
  • Depending on your requirements and Region topology, simpler solutions exist but I would need more info to make an opinionated recommendation. Another consideration is to use Geode's client as a CACHING_PROXY and the server region (or "cache") as a Partitioned region. The occasional writes are easily managed by the partitioned region. Your millions/ sec reads are easily handled by the caching client in local memory. All clients get updated when a server update occurs via guaranteed event subscription. Simpler than CQRS and much fewer nodes. Depends on #clients, #regions, #servers/region. – Wes Williams Feb 16 '17 at 19:23
1

Ignite provides native AWS integration for discovery over S3 storage: https://apacheignite-mix.readme.io/docs/amazon-aws. It solves main issue - you don't need to change configuration when instances are restarted. In a nutshell, any nodes that successfully joins topology writes its coordinates to a bucket (and removes them when fails or leaves). When you start a new node, it reads this bucket and connects to one the listed addresses.

Valentin Kulichenko
  • 8,365
  • 1
  • 16
  • 12
1

Hazelcast's Replicated Map will not work for your use-case. Note that it is a map that is replicated across all it's nodes not on the client nodes/servers. Also, as you said, it is not fully reliable yet.
Here is the Hazelcast solution:

  1. Create a Hazelcast cluster with a set of nodes depending upon the size of data.
  2. Create a Distributed map(IMap) and tweak the count & eviction configurations based on size/number of key/value pairs. The data gets partitioned across all the nodes.
  3. Setup Backup count based on how critical the data is and how much time it takes to pull the data from the actual source(DB/Files). Distributed maps have 1 backup by default.
  4. In the client side, setup a NearCache and attach it to the Distributed map. This NearCache will hold the Key/Value pair in the local/client side itself. So the get operations would end up in milliseconds.

Things to consider with NearCache solution:

  • The first get operation would be slower as it has to go through network to get the data from cluster.
  • Cache invalidation is not fully reliable as there will be a delay in synchronization with the cluster and may end reading stale data. Again, this is same case across all the cache solutions.
  • It is client's responsibility to setup timeout and invalidation of Nearcache entries. So that the future pulls would get fresh data from cluster. This depends on how often the data gets refreshed or value is replaced for a key.
A.K.Desai
  • 1,274
  • 1
  • 10
  • 16