We are building an java application which will use embedded Neo4j for graph traversal. Below are the reasons why we want to use embedded version instead of centralized server
- This app is not a data owner. Data will be ingested on it through other app. Keeping data locally will help us in doing quick calculation and hence it will improve our api sla.
- Since data foot print is small we don't want to maintain centralized server which will incur additional cost and maintenance.
- No need for additional cache
Now this architecture bring two challenges. First How to update data in all instance of embedded Neo4j application at same time. Second how to make sure that all instance are in sync i.e using same version of data.
We thought of using Kafka to solve first problem. Idea is to have kafka listener with different groupid(to ensure all get updates) in all instance . Whenever there is update, event will be posted in kafka. All instance will listen for event and will perform the update operation.
However we still don't have any solid design to solve second problem. For various reason one of the instance can miss the event (it's consumer is down). One of the way is to keep checking latest version by calling api of data owner app. If version is behind replay the events.But this brings additional complexity of maintaining the event logs of all updates. Do you guys think if it can be done in a better and simpler way?