My use case is as follow:
We have about 500 servers running in an autoscaling EC2 cluster that need to access the same configuration data (layed out in a key/value fashion) several million times per second.
The configuration data isn't very large (1 or 2 GBs) and doesn't change much (a few dozen updates/deletes/inserts per minute during peak time).
Latency is critical for us, so the data needs to be replicated and kept in memory on every single instance running our application.
Eventual consistency is fine. However we need to make sure that every update will be propagated at some point. (knowing that the servers can be shutdown at any time) The update propagation across the servers should be reliable and easy to setup (we can't have static IPs for our servers, or we don't wanna go the route of "faking" multicast on AWS etc...)
Here are the solutions we've explored in the past:
- Using regular java maps and use our custom built system to propagate updates across the cluster. (obviously, it doesn't scale that well)
- Using EhCache and its replication feature. But setting it up on EC2 is very painful and somehow unreliable.
Here are the solutions we're thinking of trying out:
- Apache Ignite (https://ignite.apache.org/) with a REPLICATED strategy.
- Hazelcast's Replicated Map feature. (http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#replicated-map)
- Apache Geode on every application node. (http://geode.apache.org/)
I would like to know if each of those solutions would work for our use case. And eventually, what issues I'm likely to face with each of them.
Here is what I found so far:
- Hazelcast's Replicated Map is somehow recent and still a bit unreliable (async updates can be lost in case of scaling down)
- It seems like Geode became "stable" fairly recently (even though it's supposedly in development since the early 2000s)
- Ignite looks like it could be a good fit, but I'm not too sure how their S3 discovery based system will work out if we keep adding / removing node regularly.
Thanks!