Rebuild queries from domain events by multiple aggregates

Question

I'm using a DDD/CQRS/ES approach and I have some questions about modeling my aggregate(s) and queries. As an example consider the following scenario:

A User can create a WorkItem, change its title and associate other users to it. A WorkItem has participants (associated users) and a participant can add Actions to a WorkItem. Participants can execute Actions.

Let's just assume that Users are already created and I only need userIds.

I have the following WorkItem commands:

CreateWorkItem
ChangeTitle
AddParticipant
AddAction
ExecuteAction

These commands must be idempotent, so I cant add twice the same user or action.

And the following query:

WorkItemDetails (all info for a work item)

Queries are updated by handlers that handle domain events raised by WorkItem aggregate(s) (after they're persisted in the EventStore). All these events contain the WorkItemId. I would like to be able to rebuild the queries on the fly, if needed, by loading all the relevant events and processing them in sequence. This is because my users usually won't access WorkItems created one year ago, so I don't need to have these queries processed. So when I fetch a query that doesn't exist, I could rebuild it and store it in a key/value store with a TTL.

Domain events have an aggregateId (used as the event streamId and shard key) and a sequenceId (used as the eventId within an event stream).

So my first attempt was to create a large Aggregate called WorkItem that had a collection of participants and a collection of actions. Participant and Actions are entities that live only within a WorkItem. A participant references a userId and an action references a participantId. They can have more information, but it's not relevant for this exercise. With this solution my large WorkItem aggregate can ensure that the commands are idempotent because I can validate that I don't add duplicate participants or actions, and if I want to rebuild the WorkItemDetails query, I just load/process all the events for a given WorkItemId.

This works fine because since I only have one aggregate, the WorkItemId can be the aggregateId, so when I rebuild the query I just load all events for a given WorkItemId. However, this solution has the performance issues of a large Aggregate (why load all participants and actions to process a ChangeTitle command?).

So my next attempt is to have different aggregates, all with the same WorkItemId as a property but only the WorkItem aggregate has it as an aggregateId. This fixes the performance issues, I can update the query because all events contain the WorkItemId but now my problem is that I can't rebuild it from scratch because I don't know the aggregateIds for the other aggregates, so I can't load their event streams and process them. They have a WorkItemId property but that's not their real aggregateId. Also I can't guarantee that I process events sequentially, because each aggregate will have its own event stream, but I'm not sure if that's a real problem.

Another solution I can think of is to have a dedicated event stream to consolidate all WorkItem events raised by the multiple aggregates. So I could have event handlers that simply append the events fired by the Participant and Actions to an event stream whose id would be something like "{workItemId}:allevents". This would be used only to rebuild the WorkItemDetails query. This sounds like an hack.. basically I'm creating an "aggregate" that has no business operations.

What other solutions do I have? Is it uncommon to rebuild queries on the fly? Can it be done when events for multiple aggregates (multiple event streams) are used to build the same query? I've searched for this scenario and haven't found anything useful. I feel like I'm missing something that should be very obvious, but I haven't figured what.

Any help on this is very much appreciated.

Thanks

Could you further explain "I can't rebuild it from scratch because I don't know the aggregateIds for the other aggregates, so I can't load their event streams and process them"? Is it possible to contain actionId in the ExcecuteActionCommand? The actionId could be retrieved by querying with workItemId after the action is added. Or did I miss something? — Yugang Zhou, Dec 08 '14 at 01:25
Yes, rebuilding WorkItemDetails from scratch would be loading all events (from the eventstore) for a given WorkItemId. But the events for Participant and Action are stored on their own event streams, although the event's payload may contain the WorkItemId. That's why I thought of having a dedicated event stream for the WorkItemDetails, so that when rebuilding I would just load the events in that event stream. I don't want to do multiple queries (possibly hundreds) to load events from multiple streams (aggregates). — Manuel Felicio, Dec 08 '14 at 23:20
Sorry, I still don't get it. WorkItemDetails is an aggregate or a query? Is it stored as the latest state of the WorkItem if it is a query? — Yugang Zhou, Dec 09 '14 at 02:35
It's a query, updated from events from multiple aggregates. My point is that when I need to rebuild from scratch I don't know which event streams to load from because I'll need to load, for instance, events from Action 1, 2, etc, and I don't know how many actions there will be and their Ids. — Manuel Felicio, Dec 10 '14 at 22:25

score 4 · Accepted Answer · edited May 23 '17 at 12:33

4

I don't think you should design your aggregates with querying concerns in mind. The Read side is here for that.

On the domain side, focus on consistency concerns (how small can the aggregate be and the domain still remain consistent in a single transaction), concurrency (how big can it be and not suffer concurrent access problems / race conditions ?) and performance (would we load thousands of objects in memory just to perform a simple command ? -- exactly what you were asking).

I don't see anything wrong with on-demand read models. It's basically the same as reading from a live stream, except you re-create the stream when you need it. However, this might be quite a lot of work for not an extraordinary gain, because most of the time, entities are queried just after they are modified. If on-demand becomes "basically every time the entity changes", you might as well subscribe to live changes. As for "old" views, the definition of "old" is that they are not modified any more, so they don't need to be recalculated anyways, regardless of if you have an on-demand or continuous system.

If you go the multiple small aggregates route and your Read Model needs information from several sources to update itself, you have a couple of options :

Enrich emitted events with additional data
Read from multiple event streams and consolidate their data to build the read model. No magic here, the Read side needs to know which aggregates are involved in a particular projection. You could also query other Read Models if you know they are up-to-date and will give you just the data you need.

See CQRS events do not contain details needed for updating read model

edited May 23 '17 at 12:33

Community

1
1

answered Dec 09 '14 at 11:08

guillaume31

13,738
1
32
51

I know there's no magic here, but it's seems wrong to design the aggregates so that I may rebuild queries in a deterministically way, but if I can't, at any given time, have a consistent view of my data, how can I say that my aggregates are well designed for consistency purposes? Maybe I'm missing advanced features in my EventStore, which are event streams that are projections from other event streams, and has nothing to do with how aggregates are designed. Am I trying to do something that is not so common in CQRS? – Manuel Felicio Dec 10 '14 at 22:46
You can say your aggregates are well designed when each transaction leaves the aggregate in a state that doesn't violate any domain invariant or rule (except if you use eventual consistency). That the view is coherent is only a byproduct of that and not the main goal. Also, a view is only valid at a given point in time, and changes based on what the view showed can be rejected by the aggregate because they are no longer valid. – guillaume31 Dec 11 '14 at 08:37
I'll let others answer about that, but yes, based on my experience, what you're trying to do is uncommon in CQRS. – guillaume31 Dec 11 '14 at 08:42
1

I was unsure if it was a common use case or not. I'll try to model my aggregates as lists. WorkItemParticipants and WorkItemActions. This will also let me avoid duplicates and have two separate queries, for participants and actions. The WorkItemDetails query can be rebuilt from these two. Again, this is only an issue when rebuilding from scratch. I'll accept your answer. It would be nice to have others comment on their experience with this particular use case as well. – Manuel Felicio Dec 11 '14 at 09:53
Enrich events is not a good idea. I tried very hard, but it is just wrong, because the other data is not related to the aggretate, it is not even in its transaction scope, so the data could be wrong. You wan to buy a ferrari, when you checkout some one updates the ferrari description to Potatoes. The event "costumer payed 300.000 USD for a bag of potatoes" is emitted, and then the Ferrari description is fixed, but your bad event will be there... forever. It would had been better to just emmit: "costumer payed 300.000 USD for item 5321234". – Narvalex Nov 15 '17 at 12:49

Rebuild queries from domain events by multiple aggregates

1 Answers1

Linked