I'm working on a generic CQRS + ES framework (with nodejs) in the company. Remark: Only RDBMS + Redis (without AOF/RDB persistence) is allowed due to some reasons.
I really need some advices on how to implement the CQRS + ES framework.... Ignoring the ES part, I'm struggling with the implementation on the message propagation.
Here is the tables I have in the RDBMS.
EventStore: [aggregateId (varchar), aggregateType (varchar), aggregateVersion (bigint), messageId (varchar), messageData (varchar), messageMetadata (varchar), sequenceNumber (bigint)]
EventDelivery: [messageId (varchar, foreign key to EventStore), sequenceId (equal to aggregateId, varchar), sequenceNumber (equal to the one in EventStore, bigint)]
ConsumerGroup: [consumerGroup (varchar), lastSequenceNumberSeen (bigint)]
And I have multiple EventSubscriber
// In Application 1
@EventSubscriber("consumerGroup1", AccountOpenedEvent)
...
// In Application 2
@EventSubscriber("consumerGroup2", AccountOpenedEvent)
...
Here is the the flow when an AccountOpenedEvent
is written to EventStore
table.
- For each application (i.e application 1 and application 2), it will scan the codebase to obtain all the
@EventSubscriber
, create a consumer group inConsumerGroup
table withlastSequeneNumberSeen = 0
, then having a scheduler (with 100ms polling interval) to poll all the interested events (group by consumer group) inEventStore
with conditionsequeneNumber >= lastSequeneNumberSeen
. - For each event (
EventStore
) in step 1, calculate thesequenceId
(here the sequenceId is equal to aggregateId), thissequenceId
(together with thesequenceNumber
) is used to guarantee the message delivery ordering. Persist it intoEventDelivery
table, and update thelastSequeneNumberSeen = sequenceNumber
(this is to prevent duplicate event being scanned in next interval). - For each application (i.e application 1 and application 2), we have another scheduler (also with 100ms polling interval) to poll the
EventDelivery
table (group byseqeunceId
and order bysequenceNumber
ASC). - For each event (
EventDelivery
) in step 3, call the corresponding message handler, after message is handled, acknowledge the message by deleting the record inEventDelivery
.
Since I have 2 applications, I have to separate the AccountOpenedEvent
in EventStore
into 2 transactions, supposing 2 applications don't know each other, I can only do it passively. Thats why I need the EventDelivery
table and polling scheduler.
Assuming I can use redlock + cron to make sure there is only 1 instance do the polling jobs, in case application 1 have more than 1 replicas.
Application 1 will poll the AccountOpenedEvent
and create a record in EventDelivery
, and store the lastSequenceNumberSeen
in its consumer group.
Application 2 will also poll the AccountOpenedEvent
and create a record in EventDelivery
and store the lastSequenceNumberSeen
in its consumer group.
Since application 1 and application 2 are different consumer group, they treat the event store stream separately.
Here is a problem, we have 2 schedulers and we would have more if there are more consumer group, these will make heavy traffic loads to the database. How to solve this? One of my solution is convert these 2 schedulers to a job and put these jobs into queue, the queue will handle the jobs per interval (lets say 100ms), but seems like this would introduce large latency if the job is unfortunately placed at the end of the queue.
Here is the 2nd problem, in the above flow, I introduced the 2nd polling job to guarantee the message delivery ordering. But unlike the first one, I don't have the lastSequenceNumberSeen
, the 2nd polling job will remove the job in EventDelivery
if the message is handled. But it is common a message would be handled over 100ms. If thats in case, the same event in EventDelivery
will be scanned again.
I'm not sure the common practice. I'm quite struggling on how to implement this. I did lots of research on the internet. I see some of them implement the message propagation by using Debezium
+ Kafka
(Although I cannot use these 2 tools, I still cannot understand how it works).
I know Debezium
using CDC
approach to tail the transaction logs of RDBMS and forward the message to Kafka
. And I see some recommendations that we should not have multiple subscription on the same transaction log. Let's say Debezium
guaranteed the event can be propagated to Kafka
, it means I need applciation 1 and applciation 2 subscribe the Kafka
topic, both should belongs to different consumer group (also use aggregateId as partition key). Since Kafka
guaranteed the message ordering, everything should work fine. But I don't think Kafka
would store all the message from the most beginning, lets say it is configured to store 1000000 messages, when the message handler keep failed due to unexpected reason, the 1000000 messages after this failed message cannot be handled, the 1000001th event will get lost... Although this is rare case, I'm not sure I understand it right or not, the database table is the most reliable source to trust as it store all the events from the most beginning, if the system suffer from this case, is that mean I need to manually republish all the events to Kafka
to recover the projection model?
And other case, if I have new event subscriber, which need to historical events to build the projection model. With Debezium
+ Kafka
, we need assign a new consumerGroup and configured it to read the Kafka
stream from the most beginning? It has the same problem as the consumerGroup can only get the last 1000000 events... But this is not a case if we poll the database table directly instead.
I don't understand why most implementation doesn't poll the database table but make use of message broker.
And, I really need advice on how to implement a CQRS + ES framework.... especially the message propagation part (keep in mind I can only use RDBMS + Redis(without persistence))....