CQRS + ES Implementation Advice

Question

I'm working on a generic CQRS + ES framework (with nodejs) in the company. Remark: Only RDBMS + Redis (without AOF/RDB persistence) is allowed due to some reasons.

I really need some advices on how to implement the CQRS + ES framework.... Ignoring the ES part, I'm struggling with the implementation on the message propagation.

Here is the tables I have in the RDBMS.

EventStore: [aggregateId (varchar), aggregateType (varchar), aggregateVersion (bigint), messageId (varchar), messageData (varchar), messageMetadata (varchar), sequenceNumber (bigint)]

EventDelivery: [messageId (varchar, foreign key to EventStore), sequenceId (equal to aggregateId, varchar), sequenceNumber (equal to the one in EventStore, bigint)]

ConsumerGroup: [consumerGroup (varchar), lastSequenceNumberSeen (bigint)]

And I have multiple EventSubscriber

// In Application 1
@EventSubscriber("consumerGroup1", AccountOpenedEvent)
...
// In Application 2
@EventSubscriber("consumerGroup2", AccountOpenedEvent)
...

Here is the the flow when an AccountOpenedEvent is written to EventStore table.

For each application (i.e application 1 and application 2), it will scan the codebase to obtain all the @EventSubscriber, create a consumer group in ConsumerGroup table with lastSequeneNumberSeen = 0, then having a scheduler (with 100ms polling interval) to poll all the interested events (group by consumer group) in EventStore with condition sequeneNumber >= lastSequeneNumberSeen.
For each event (EventStore) in step 1, calculate the sequenceId (here the sequenceId is equal to aggregateId), this sequenceId (together with the sequenceNumber) is used to guarantee the message delivery ordering. Persist it into EventDelivery table, and update the lastSequeneNumberSeen = sequenceNumber (this is to prevent duplicate event being scanned in next interval).
For each application (i.e application 1 and application 2), we have another scheduler (also with 100ms polling interval) to poll the EventDelivery table (group by seqeunceId and order by sequenceNumber ASC).
For each event (EventDelivery) in step 3, call the corresponding message handler, after message is handled, acknowledge the message by deleting the record in EventDelivery.

Since I have 2 applications, I have to separate the AccountOpenedEvent in EventStore into 2 transactions, supposing 2 applications don't know each other, I can only do it passively. Thats why I need the EventDelivery table and polling scheduler.

Assuming I can use redlock + cron to make sure there is only 1 instance do the polling jobs, in case application 1 have more than 1 replicas.

Application 1 will poll the AccountOpenedEvent and create a record in EventDelivery, and store the lastSequenceNumberSeen in its consumer group.

Application 2 will also poll the AccountOpenedEvent and create a record in EventDelivery and store the lastSequenceNumberSeen in its consumer group.

Since application 1 and application 2 are different consumer group, they treat the event store stream separately.

Here is a problem, we have 2 schedulers and we would have more if there are more consumer group, these will make heavy traffic loads to the database. How to solve this? One of my solution is convert these 2 schedulers to a job and put these jobs into queue, the queue will handle the jobs per interval (lets say 100ms), but seems like this would introduce large latency if the job is unfortunately placed at the end of the queue.

Here is the 2nd problem, in the above flow, I introduced the 2nd polling job to guarantee the message delivery ordering. But unlike the first one, I don't have the lastSequenceNumberSeen, the 2nd polling job will remove the job in EventDelivery if the message is handled. But it is common a message would be handled over 100ms. If thats in case, the same event in EventDelivery will be scanned again.

I'm not sure the common practice. I'm quite struggling on how to implement this. I did lots of research on the internet. I see some of them implement the message propagation by using Debezium + Kafka (Although I cannot use these 2 tools, I still cannot understand how it works).

I know Debezium using CDC approach to tail the transaction logs of RDBMS and forward the message to Kafka. And I see some recommendations that we should not have multiple subscription on the same transaction log. Let's say Debezium guaranteed the event can be propagated to Kafka, it means I need applciation 1 and applciation 2 subscribe the Kafka topic, both should belongs to different consumer group (also use aggregateId as partition key). Since Kafka guaranteed the message ordering, everything should work fine. But I don't think Kafka would store all the message from the most beginning, lets say it is configured to store 1000000 messages, when the message handler keep failed due to unexpected reason, the 1000000 messages after this failed message cannot be handled, the 1000001th event will get lost... Although this is rare case, I'm not sure I understand it right or not, the database table is the most reliable source to trust as it store all the events from the most beginning, if the system suffer from this case, is that mean I need to manually republish all the events to Kafka to recover the projection model?

And other case, if I have new event subscriber, which need to historical events to build the projection model. With Debezium + Kafka, we need assign a new consumerGroup and configured it to read the Kafka stream from the most beginning? It has the same problem as the consumerGroup can only get the last 1000000 events... But this is not a case if we poll the database table directly instead.

I don't understand why most implementation doesn't poll the database table but make use of message broker.

And, I really need advice on how to implement a CQRS + ES framework.... especially the message propagation part (keep in mind I can only use RDBMS + Redis(without persistence))....

Just use temporal.io which is a much higher-level abstraction than CQRS + ES. — Maxim Fateev, Nov 09 '22 at 02:08

CQRS + ES Implementation Advice

0 Answers0