AxonIQ process new events while replaying older ones

Question

Every now and then, it happens that we have to replay events from axon event store to update our projections. This replay takes several hours and that will only keep taking longer as time goes by. That means our application is not processing new events whilst the replay is taking place.

I wonder if there is a way to keep processing events while doing a replay at the same time? Does axon provide such a mechanism, or do we have to create two distinct eventhandlers: one for "normal" processing and one for replay?

Steven · Answer 1 · 2021-05-03T08:04:51.183

Axon does not provide a solution to this out of the box. There are roughly speaking two approaches you can take:

Do a form of blue/green deployment. Nine out of ten, a replay required to comply with the new Query Model format constructed during development. If you use a blue/green deployment style, you can simply warm-up/replay the new instance and jump over once it is done.
Implement additional components to construct a new TrackingEventProcessor and its contained event handling components during the invocation of the replay. Furthermore, a new query model store needs to run in parallel to store the new format, with aliases in place to switch over once the process is done.

I'd wager option one of these would be the least code work and more ops. Option 2 flips that around of course.

Note that you will always need to deal with some level of eventual consistency when doing a replay. Events simply keep happening in a system, so reaching the end of the event stream is conceptually not going to happen. Hence you will need to decide when the replay is far enough prior to constructing any solution in this space.

Lastly, there are means to improve event handling speed too, thus improving replay speed as well. This blog gives a quick glance of what you can do to this end, which in short can be summarized by adjusting the event handling batchSize, the number of processing threads and segments, and lastly tieing into Axon's UnitOfWork to improve database access.

Update

What's good to take into account when it comes to the replay functionality, is to notice it is a powerful, but heavy tool. Any eventing system (so not just Axon) where you'd need to do a replay over billions of events would impose some strains on the implementation and deployment strategies. Or taking this a step further, whether you should even do this.

Hence, you will have to deduce whether a replay is the best solution for the problem at hand. Sometimes, simply updating your Query Models directly is the most pragmatic solution and a massive timesaver. But in some scenarios, you would have implemented the enhancements in event handling and deployment strategy to allow quick replays.

I've also seen a lot of domains where a replay did not require a full replay of the event store. If the users of the application only require the last year worth of models to be present immediately, then it is far simpler to only replay the last year's worth of events. Maybe even the last 3 months will suffice.

From a conceptual point of view, a replay can resolve all answers in your system. That's definitely the good thing about going for an Event Sourcing solution, as you effectively have a single source of truth. But as with any tool, you should use it for the right problem at the right time.

score 0 · Answer 2 · answered Apr 29 '21 at 15:09

0

Thank you Steven. The improvement in the blog are interesting, but even then, a 1 billion event stream would take close to 10 hours to replay and that is simply not acceptable to miss new events for such a long period. So I guess even with super good event handling speed, one would still need to tackle the problem by implementing one of the two solutions that you propose.

answered Apr 29 '21 at 15:09

Marc

83
7

1

Indeed true, you'd have to find some approach that helps the scenario. I'd wager this is a general "problem" with replay functionality of any system. You're simply replaying (potentially) millions of events. There's just so much optimization you can do to make this more efficient. I've added this conceptual pointer to my original response in more detail; let me know what you think. – Steven May 03 '21 at 07:55
1

It's also good to know that an event store of 1 billion events is likely not going to be replayed every day, or from the beginning of time at all. The oldest event in a 1-billion-events worth event store is typically going to be several years old. How often does your system require that first event? It's good to know ask yourself these things before hoping to _always_ be able to replay 1 billion+ events at a whim. – Steven May 03 '21 at 08:08

AxonIQ process new events while replaying older ones

2 Answers2