I'm designing an Event Store on AWS and I chose DynamoDB because it seemed the best option. My design seems to be quite good, but I'm facing some issues that I can't solve.
**The design
Events are uniquely identified by the pair (StreamId, EventId)
:
StreamId
: it's the same of the aggregateId, which means one Event Stream for one Aggregate.EventId
: an incremental number that helps keeping the ordering inside the same Event Stream
Events are persisted on DynamoDb. Each event maps to a single record in a table where the mandatory fields are StreamId, EventId, EventName, Payload (more fields can be added easily).
The partitionKey is the StreamId, the sortKey is the EventId.
Optimistic Locking is used while writing an event to an Event Stream. To achieve this, I'm using the DynamoDb conditional writes. If an event with the same (StreamId, EventId) already exists, I need to recompute the aggregate, recheck business conditions and finally write again if business conditions pass.
Event Streams
Each Event Stream is identified by the partitionKey. Query a stream for all events equals to query for partitionKey=${streamId} and sortKey between 0 and MAX_INT.
Each Event Stream identifies one and only one aggregate. This helps to handle concurrent writes on the same aggregate using optimistic locking as explained before. This also grants great performance while recomputing an aggregate.
Publication of events
Events are published exploiting the combination of DynamoDB Streams + Lambda.
Replay events
Here's where the issues start. Having each event stream mapped with only one aggregate (which leads to having a great number of event streams), there's no easy way to know which event streams from which I need to query for all events.
I was thinking of using an additional record, somewhere in DynamoDB that stores in an array all StreamIds. I can then query for it and start querying for the events, but if a new stream is created while I'm replaying, I'll lose it.
Am I missing something? Or, is my design simply wrong?