Compensating Events on CQRS/ES Architecture

Question

So, I'm working on a CQRS/ES project in which we are having some doubts about how to handle trivial problems that would be easy to handle in other architectures

My scenario is the following:

I have a customer CRUD REST API and each customer has unique document(number), so when I'm registering a new customer I have to verify if there is another customer with that document to avoid duplicity, but when it comes to a CQRS/ES architecture where we have eventual consistency, I found out that this kind of validations can be very hard to address.

It is important to notice that my problem is not across microservices, but between the command application and the query application of the same microservice.

Also we are using eventstore.

My current solution:

So what I do today is, in my command application, before saving the CustomerCreated event, I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right? Because my query can be desynchronized, so I cannot trust it 100%. That's when my second validation kicks in, when my query application is processing the events and saving them to my PostgreSQL, I check again if there is a customer with that document and if there is, I reject that event and emit a compensating event to undo/cancel/inactivate the customer with the duplicated document, therefore finishing that customer stream on eventstore.

Altough this works, there are 2 things that bother me here, the first thing is my command application relying on the query application, so if my query application is down, my command is affected (today I just return false on my validation if query is down but still...) and second thing is, should a query/read model really be able to emit events? And if so, what is the correct way of doing it? Should the command have some kind of API for that? Or should the query emit the event directly to eventstore using some common shared library? And if I have more than one view/read? Which one should I choose to handle this?

Really hope someone could shine a light into these questions and help me this these matters.

What identifies a document as unique? Its contents? Id? Something else? — guillaume31, Mar 23 '18 at 13:58
@guillaume31 the customer has an ID (UUID), but also a document which is a number, is like a social security number — Leonardo Ferreira, Mar 23 '18 at 14:00
Possible duplicate of [CQRS Event Sourcing check username is unique or not from EventStore while sending command](https://stackoverflow.com/questions/31386244/cqrs-event-sourcing-check-username-is-unique-or-not-from-eventstore-while-sendin) — guillaume31, Mar 23 '18 at 14:04
Is the decision to make that rule eventually consistent an educated choice agreed with the domain expert? Because there are ways to make it immediately consistent (see other question above). — guillaume31, Mar 23 '18 at 14:09
@guillaume31 I dont see a way to make it immediately consistent without creating another stream on eventstore to perform the validation, and as far as I know, eventstore doesn't have transactions across different streams, so to create another stream with all the used documents I would have a scenario where I have two requests to eventstore and not being able to guarantee both, if one of them fail, the other one will already be done. — Leonardo Ferreira, Mar 23 '18 at 14:13
@guillaume31 I read the link you sent, and my scenario is not quite the same, in that example they are using a relational database which has support for ACID transactions, so if they want, they can easily create a table to handle that validation and still be able to guarantee the transaction — Leonardo Ferreira, Mar 23 '18 at 14:15
There are at least one comment to the accepted answer and two other answers that point at solutions compatible with an ES scenario though. — guillaume31, Mar 23 '18 at 14:18
Also, the accepted answer includes a pragmatic solution in case you really want to go eventually consistent (proactive client side uniqueness check + reactive saga). — guillaume31, Mar 23 '18 at 14:22
@guillaume31 event with proactive client side uniqueness check I have have a window where duplications can happend, and about reactive saga, isn't that what I'm already doing? Emitting compensating event from the query/read model? And if so, you didn't address my other questions about how to do that correctly, how to emit these events, etc... — Leonardo Ferreira, Mar 23 '18 at 14:25

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

For reference, you may want to be reviewing what Greg Young has written about Set Validation.

I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right?

That's exactly right - your read model is stale copy, and may not have all of the information collected by the write model.

That's when my second validation kicks in, when my query application is processing the events and saving them to my PostgreSQL, I check again if there is a customer with that document and if there is, I reject that event and emit a compensating event to undo/cancel/inactivate the customer with the duplicated document, therefore finishing that customer stream on eventstore.

This spelling doesn't quite match the usual designs. The more common implementation is that, if we detect a problem when reading data, we send a command message to the write model, telling it to straighten things out.

This is commonly referred to as a process manager, but you can think of it as the automation of a human supervisor of the system. Conceptually, a process manager is an event sourced collection of messages to be sent to the command model.

You might also want to consider whether you are modeling your domain correctly. If documents are supposed to be unique, then maybe the command model should be using the document number as a key in the book of record, rather than using the customer. Or perhaps the document id should be a function of the customer data, rather than being an arbitrary input.

as far as I know, eventstore doesn't have transactions across different streams

Right - one of the things you really need to be thinking about in general is where your stream boundaries lie. If set validation has significant business value, then you really need to be thinking about getting the entire set into a single stream (or by finding a way to constrain uniqueness without using a set).

How should I send a command message to the write model? via API? via a message broker like Kafka?

That's plumbing; it doesn't really matter how you do it, so long as you are sure that the command runs within its own transaction/unit of work.

Hi @VoiceOfUnreason, thanks for the detailed reponse! How should I send a command message to the write model? via API? via a message broker like Kafka? — Leonardo Ferreira, Mar 23 '18 at 15:09
I can't really use the document as the key because my application uses multi tenants, so the document can be duplicated across tenants but not on the same tenant, unless I some how create a key based on the document and the tenant, but I dont like that solution — Leonardo Ferreira, Mar 23 '18 at 15:14

score 4 · Answer 2 · edited Mar 29 '18 at 15:11

4

So what I do today is, in my command application, before saving the CustomerCreated event, I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right? Because my query can be desynchronized, so I cannot trust it 100%.

No, you cannot safely rely on the query side, which is eventually consistent, to prevent the system to step into an invalid state.

You have two options:

You permit the system to enter in a temporary, pending state and then, eventually, you will bring it into a valid permanent state; for this you could allow the command to pass, yield CustomerRegistered event and using a Saga/Process manager you verify against a uniquely-indexed-by-document-collection and issue a compensating command (not event!), i.e. UnregisterCustomer.
Instead of sending a command, you create&start a Saga/Process that preallocates the document in a uniquely-indexed-by-document-collection and if successfully then send the RegisterCustomer command. You can model the Saga as an entity.

So, in both solution you use a Saga/Process manager. In order for the system to be resilient you should make sure that RegisterCustomer command is idempotent (so you can resend it if the Saga fails/is restarted)

edited Mar 29 '18 at 15:11

TomW

3,923
1
23
26

answered Mar 23 '18 at 15:06

Constantin Galbenu

16,951
3
38
54

Hi @Constantin Galbenu, thanks for the answer! So is not a bad thing trying to use the query to validate that even though it may be unsynchronized?! And how should that CustomerRegistered command be sent to the write model? Via API? message broker? – Leonardo Ferreira Mar 23 '18 at 15:11
@LeonardoFerreira I wrote that you CANNOT rely on the query side. – Constantin Galbenu Mar 23 '18 at 15:20
@LeonardoFerreira are you asking for the option no. 2? – Constantin Galbenu Mar 23 '18 at 15:21
oh alright sorry, I thought that even with the query site being eventually consistent I could use it to mitigate the problem, thanks for clearing that out – Leonardo Ferreira Mar 23 '18 at 15:22
no, I asking about option .1 where you say "and issue a compensating command (not event!), i.e. UnregisterCustomer" – Leonardo Ferreira Mar 23 '18 at 15:23
@LeonardoFerreira how exactly you send a command is dependent on your exact architecture. From the question's point of view is irrelevant... – Constantin Galbenu Mar 23 '18 at 15:28
Can you please help me understand why not compensating event? – Charlie Dec 09 '18 at 16:26
1

@Charlie only the Aggregate may issue an Event but it desn't have the posibility to detect the invalid system state (a Saga using an Infrastructure service can). So the Saga, after it detects the invalid state it sends the Command to the second Aggregate. – Constantin Galbenu Dec 09 '18 at 17:48

score 1 · Answer 3 · answered Mar 23 '18 at 15:29

You've butted up against a fairly common problem. I think the other answer by VoicOfUnreason is worth reading. I just wanted to make you aware of a few more options.

A simple approach I have used in the past is to create a lookup table. Your command tries to register the key in a unique constraint table. If it can reserve the key the command can go ahead.
Depending on the nature of the data and the domain you could let this 'problem' occur and raise additional events to mark it. If it is something that's important to the business/the way the application works then you can deal with it either manually or at the time via compensating commands. if the latter then it would make sense to use a process manager.
In some (rare) cases where speed/capacity is less of an issue then you could consider old-fashioned locking and transactions. Admittedly these are much better suited to CRUD style implementations but they can be used in CQRS/ES.

I have more detail on this in my blog post: How to Handle Set Based Consistency Validation in CQRS

I hope you find it helpful.

Hey @codescribler thanks for the answer. About the first approach, I dont think that works on a non relational database, because I cannot guarantee both writes, the unique constraint stream and my customer stream... I'll read the blog post you sent, thanks!! — Leonardo Ferreira, Mar 23 '18 at 15:56
Yes, you are right it doesn't. I often find I have more than 1 type of database. If you have a database which can guarantee uniqueness then why not use it? No point in adding in one just for one use case. — Codescribler, Mar 23 '18 at 16:00

Compensating Events on CQRS/ES Architecture

3 Answers3