Message types : how much information should messages contain?

Question

We are currently starting to broadcast events from one central applications to other possibly interested consumer applications, and we have different options among members of our team about how much we should put in our published messages.

The general idea/architecture is the following :

In the producer application :
- the user interacts with some entities (Aggregate Roots in the DDD sense) that can be created/modified/deleted
- Based on what is happening, Domain Events are raised (ex : EntityXCreated, EntityYDeleted, EntityZTransferred etc ... i.e. not only CRUD, but mostly )
- Raised events are translated/converted into messages that we send to a RabbitMQ Exchange
in RabbitMQ (we are using RabbitMQ but I believe the question is actually technology-independent):
- we define a queue for each consuming application
- bindings connect the exchange to the consumer queues (possibly with message filtering)
In the consuming application(s)
- application consumes and process messages from its queue

Based on Enterprise Integration Patterns we are trying to define the Canonical format for our published messages, and are hesitating between 2 approaches :

Minimalist messages / event-store-ish : for each event published by the Domain Model, generate a message that contains only the parts of the Aggregate Root that are relevant (for instance, when an update is done, only publish information about the updated section of the aggregate root, more or less matching the process the end-user goes through when using our application)
- Pros
  - small message size
  - very specialized message types
  - close to the "Domain Events"
- Cons
  - problematic if delivery order is not guaranteed (i.e. what if Update message is received before Create message ? )
  - consumers need to know which message types to subscribe to (possibly a big list / domain knowledge is needed)
  - what if consumer state and producer state get out of sync ?
  - how to handle new consumer that registers in the future, but does not have knowledge of all the past events
Fully-contained idempotent-ish messages : for each event published by the Domain Model, generate a message that contains a full snapshot of the Aggregate Root at that point in time, hence handling in reality only 2 kind of messages "Create or Update" and "Delete" (+metadata with more specific info if necessary)
- Pros
  - idempotent (declarative messages stating "this is what the truth is like, synchronize yourself however you can")
  - lower number of message formats to maintain/handle
  - allow to progressively correct synchronization errors of consumers
  - consumer automagically handle new Domain Events as long as the resulting message follows canonical data model
- Cons
  - bigger message payload
  - less pure

Would you recommend an approach over the other ?

Is there another approach we should consider ?

score 5 · Accepted Answer · answered Sep 20 '16 at 19:08

Is there another approach we should consider ?

You might also consider not leaking information out of the service acting as the technical authority for that part of the business

Which roughly means that your events carry identifiers, so that interested parties can know that an entity of interest has changed, and can query the authority for updates to the state.

for each event published by the Domain Model, generate a message that contains a full snapshot of the Aggregate Root at that point in time

This also has the additional Con that any change to the representation of the aggregate also implies a change to the message schema, which is part of the API. So internal changes to aggregates start rippling out across your service boundaries. If the aggregates you are implementing represent a competitive advantage to your business, you are likely to want to be able to adapt quickly; the ripples add friction that will slow your ability to change.

what if consumer state and producer state get out of sync ?

As best I can tell, this problem indicates a design error. If a consumer needs state, which is to say a view built from the history of an aggregate, then it should be fetching that view from the producer, rather than trying to assemble it from a collection of observed messages.

That is to say, if you need state, you need history (complete, ordered). All a single event really tells you is that the history has changed, and you can evict your previously cached history.

Again, responsiveness to change: if you change the implementation of the producer, and consumers are also trying to cobble together their own copy of the history, then your changes are rippling across the service boundaries.

- "...your events carry identifiers, so that interested parties can know that an entity ... has changed, and can query the authority for updates..." -> that makes sense, but are we not over-coupling consumers to the producer in that case ? (i.e. producer must be up and running for consumers to do their job properly ) — tsimbalar, Sep 21 '16 at 04:32
"any change to the representation of the aggregate also implies a change to the message schema, which is part of the API" -> but it's not an issue as long as we are only adding to the schemas, right ? not all parts of the aggregate need to be in the message — tsimbalar, Sep 21 '16 at 04:33
"If a consumer needs state, ..., then it should be fetching that view from the producer, " -> makes sense ! I'll wait a few days for other answers before accepting any ;) — tsimbalar, Sep 21 '16 at 04:34

Message types : how much information should messages contain?

1 Answers1