How to check data integrity/consistency in asynchronous (event-driven) system

Question

Let's say we have an asynchronous event-driven system where service A owns some data, publishes events and service B is consuming them and is storing the copy of the data in its local DB.

I'd like to be able to check if the local copy in B is consistent with the source in A. I don't care about eventual consistency, I understand that B can be behind. What I care about is to check that

The event producing code is correct
We have not lost any messages in transit
The consumer code is correct

All the literature I have seen either deals with eventual consistency or silently assumes that the event processing code does not contain any bugs. To me this is unrealistic. Can somebody point me where to learn about the subject, please?

The ideas we had:

The event stream is the source of truth. Now we should check that the source DB and the event stream are in sync. It's doable
Switch to event sourcing in A. We do not want to completely rewrite A
Expose data on a sync endpoint on A and use it to check the inconsistencies (and fix them if needed).
Ignore it and hope the systems will stay consistent and if not, nobody will notice :-)

When you say "event producing code", aggregates are supposed to produce events when some transaction happens on them. We can test whether the right events are produced. Regarding losing messages in transit, I think message buses offer some reliability guarantees. We can argue that nothing is perfect. So, even if you replace message passing from A to B with a synchronous call from A to B, we can argue that A may fail to call. — Salil, Mar 30 '23 at 01:33
The problem is, that the source system is a standard CRUD-based system so no aggregates for us. With synchronous calls, I at least know that the call failed, in the event-based system we can have data discrepancies and no way how to detect them. — Lukas, Mar 31 '23 at 11:02

score 2 · Answer 1 · answered Apr 03 '23 at 20:28

This is generally handled with sequential sequence numbers to ensure messages are applied in the correct order. It definitely can cause a bit more churn in that every re-ordering of messages will require back-off/retry logic, but it does ensure you have consistency.

The question is, what do you do when you lack consistency? If you do miss a message, or an error occurs processing it, what's your fallback plan?

Eventual consistency design often just allows this to happen, and has some sort of periodic "clean-up" sync which pulls all the data in-bulk and ensures that things are clean.

How to check data integrity/consistency in asynchronous (event-driven) system

1 Answers1