Pitfalls with local in memory cache invalidated using RabbitMQ

Question

I have a java web server and am currently using the Guava library to handle my in-memory caching, which I use heavily. I now need to expand to multiple servers (2+) for failover and load balancing. In the process, I switched from a in-process cache to Memcache (external service) instead. However, I'm not terribly impressed with the results, as now for nearly every call, I have to make an external call to another server, which is significantly slower than the in-memory cache.

I'm thinking instead of getting the data from Memcache, I could keep using a local cache on each server, and use RabbitMQ to notify the other servers when their caches need to be updated. So if one server makes a change to the underlying data, it would also broadcast a message to all other servers telling them their cache is now invalid. Every server is both broadcasting and listening for cache invalidation messages.

Does anyone know any potential pitfalls of this approach? I'm a little nervous because I can't find anyone else that is doing this in production. The only problems I see would be that each server needs more memory (in-memory cache), and it might take a little longer for any given server to get the updated data. Anything else?

I found this [article](http://java.dzone.com/articles/process-caching-vs-distributed) on DZone helpful, but it did not address messaging other systems to keep caches consistent. — CrazyDevMan, Jan 18 '14 at 16:13
Please elaborate on your setup. There is an issue with your cache/server configuration if you are seeing response times 250-350ms. — theMayer, Jan 21 '14 at 21:03
We have similar setup in our system, and it works. Only difference is, we use notifications to notify that entry needs to be evicted (for simplicity). In this case notified instance needs to re-query again persistent storage to get most recent value and then update local in-memory cache. But it all depends how frequently erased item is accessed. — Grandys, Jan 30 '23 at 16:05

score 8 · Accepted Answer · edited Jun 20 '20 at 09:12

8

I am a little bit confused about your problem here, so I am going to restate in a way that makes sense to me, then answer my version of your question. Please feel free to comment if I am not in line with what you are thinking.

You have a web application that uses a process-local memory cache for data. You want to expand to multiple nodes and keep this same structure for your program, rather than rely upon a 3rd party tool (memcached, Couchbase, Redis) with built-in cache replication. So, you are thinking about rolling your own using RabbitMQ to publish the changes out to the various nodes so they can update the local cache accordingly.

My initial reaction is that what you want to do is best done by rolling over to one of the above-mentioned tools. In addition to the obvious development and rigorous testing involved, Couchbase, Memcached, and Redis were all designed to solve the problem that you have.

Also, in theory you would run out of available memory in your application nodes as you scale horizontally, and then you will really have a mess. Once you get to the point when this limitation makes your app infeasible, you will end up using one of the tools anyway at which point all your hard work to design a custom solution will be for naught.

The only exceptions to this I can think of are if your app is heavily compute-intensive and does not use much memory. In this case, I think a RabbitMQ-based solution is easy, but you would need to have some sort of procedure in place to synchronize the cache between the servers on occasion, should messages be missed in RMQ. You would also need a way to handle node startup and shutdown.

Edit

In consideration of your statement in the comments that you are seeing access times in the hundreds of milliseconds, I'm going to advise that you first examine your setup. Typical read times for a single item in the cache from a Memcached (or Couchbase, or Redis, etc.) instance are sub-millisecond (somewhere around .1 milliseconds if I remember correctly), so your "problem child" of a cache server is several orders of magnitude from where it should be in terms of performance. Start there, then see if you still have the same problem.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 18 '14 at 05:07

theMayer

15,456
7
58
90

Makes sense, but it's not exactly my problem. My original cache is **local to the process** (think of Guava cache as a map or a dictionary). It does not have any built-in mechanism to expand to multiple nodes. Memcache does have this feature, which is why I switched to it, but now the cache is not local to my web server, it's on a different machine which slows things down. – CrazyDevMan Jan 18 '14 at 16:04
4

Is there some reason you can't run memcached on the web server box? – Chris Johnson Jan 18 '14 at 16:24
Some memcached clients keep connections to the servers open to optimize response time; I'm struggling with your use case that a few tenths of a millisecond are enough to cause a noticeable degradation in performance. Could you please elaborate on that? – theMayer Jan 19 '14 at 22:55
@ChrisJohnson I really like your suggestion to run memcached on the same server, I will try that. – CrazyDevMan Jan 21 '14 at 18:24
@rmayer06 It's about 150 to 200 ms for each call to memcache (nearly all that time is spent on the network request). Since I'm only storing the models in the cache right now, some requests require 2 calls (serialized) because I need the first model to figure out what model to get for the second call. – CrazyDevMan Jan 21 '14 at 18:31
150 to 200ms is several orders of magnitude greater than what it should be. Something is off with your setup. You aren't trying to access it over a WAN, by chance? Also, I would use Couchbase as your memcached server, it is proven and commercially-supported. Just my 2 cents there. – theMayer Jan 21 '14 at 21:00
+1 For some great points. Theoretically, an advantage of this approach would be to avoid serialization of objects, assuming the caches are going to live on the same machines as the application. There is an overwhelming list of disadvantages though, some of which you address here. I posted the bounty out of curiosity to see if the idea had ever been tried in some shape or form, but I guess it's untenable. – Paul Bellora Jan 23 '14 at 04:16
Thanks Paul! I actually do use some local data caching in my application, but it re-syncs itself every 5 minutes and I wouldn't consider it in a multi-user scenario such as a web app. – theMayer Jan 23 '14 at 11:26

score 1 · Answer 2 · edited Jan 22 '14 at 23:41

We're using something similar for data which is read-only and doesn't require updated every time. I'm in doubt, that this is good plan for you. Just imagine you should have one more additional service on each instance, which will monitor queue, and process change to in-memory storage. This is very hard to test.

Are you sure that most of the time is spent on communication between your servers? Maybe you run multiple calls?

Pitfalls with local in memory cache invalidated using RabbitMQ

2 Answers2

Edit