I have an app with hundreds of horizontally scaled servers which uses redis pub/sub, and it works just fine.
The redis server is a central point of failure. Whenever redis fails (well, it happens sometimes), our application falls into inconsistent state and have to follow recovery process which takes time. During this time the entire app is hardly useful.
Is there any messaging system/framework option, similar to redis pub/sub, but with redundancy and high availability so that if one instance fails, other will continue to deliver the messages exchanged between application hosts?
Or, better, is there any distributed messaging system in which app instances exchange the messages in a peer-to-peer manner, so that there is no single point of failure?