CQRS vs Database replica set

Question

I have one confusing point over CQRS(Command and Query Responsibility Segregation) pattern.

As we know, in CQRS the read and write operations should happen on separated databases. And in the application level, we need to sync data to read database when write operation happens. For example, the following diagram shows the CQRS pattern which writes data to sqlite and sync the data based on rabbitmq to mongodb as read database. (The basic example shown here is based on this article talking about CQRS)

And we also know, the database in production run as a cluster consists of a replica set which can provide the high availability feature. In the replica set, the primary server receives write(and read) operations and sync data to the secondary server(which can response read request). The cluster does a lot of complex works(like election algorithm) to maintain the consistency among the replica set.

My confusing point is if we use CQRS pattern in the application level, we have separated database for read and write. The database here is a standalone server or a cluster(which contains read/write server inside).

This question doesn't provide sample code, it's more like an architecture level question.

score 2 · Answer 1 · answered Mar 21 '23 at 15:13

Note that it's not completely clear what your question is, but I'll interpret it as "why would we architect our system using CQRS instead of just taking advantage of a Mongo-style DB?"

The approach of a primary DB instance handling the writes with read replicas handling the reads is basically an implementation of CQRS (likewise, most databases tend to use a write-ahead log, which is basically event-sourcing (albeit for space reasons, the DBs are more aggressive about snapshotting and truncating than a CQRS/ES-by-design application would be)).

So if your DB is already providing CQRS under the hood, why would you architect your application along CQRS lines? The answer there is basically that DBs have to make fairly pessimistic assumptions about access patterns, consistency requirements, and the like. There's thus a universe of optimizations which they don't make. In our application architecture, we have greater knowledge of requirements and expectations, which enables us to incorporate those optimizations (which in many cases, to be clear, entail not having what a DB designer would consider to be "safeguards").

Additionally, many examples in the literature of CQRS (especially those where the commands are all just the CUD from CRUD) obscure the other benefit of CQRS: the ability to have a data model optimized for processing commands separately from the data models which handle queries (and having multiple read models is another benefit). For example, for low command volumes, a heavily normalized relational DB which doesn't have indices beyond primary keys may be up to the task, but a denormalized document store like Mongo is much more able to handle the query load.

score 1 · Answer 2 · answered Mar 18 '23 at 09:18

The read/write separation in the database is more an implementation detail than CQRS. CQRS is all about separating the implementation of reads and write. But, how you do it is up to you.

Secondly, in your picture, using a queue between the read and write side, I think, is not that optimal. For example:

What if you want to rebuild the read side?
What if the messages in the queue get out of order? (guaranteed ordering issue)

How to guarantee message order in RabbitMQ (or any other asynchronous message queue service)
Duplicate messages (at-least-once)

https://mull-overthing.com/is-rabbitmq-at-least-once/
What if you want to add multiple read sides?

A more simple and more flexible approach is to let the read side "pull" the data it needs from the write side. That simplifies the implementation of the write side.

score 0 · Answer 3 · answered Jul 18 '23 at 19:50

Levi's answer details the same point, but just for brevity and more simple terms:

All replicas of a database will almost always store the data in the same format since the database replication does not know about your querying patterns. If they dont, they are probably implementing CQRS internally. Customer database writes data as Customer {id, name, address} and the read replicas will have it in the same form.

So if your most frequent query is List Customer Names from Random-City, be it the primary or secondary on the replica set, you have to:

Scan all records > Find records with this city > Get those names

If you frequently expect this query, you pre-calculate this datastore the data as CityName [customer name], essentially a different structure of the same data which is optimised for reads for our context. Then its a single read to get the whole list of names. This is the crux of CQRS and the database usually cant do this in its replicas since it does not assume your querying patterns.

CQRS vs Database replica set

3 Answers3