0

We have standard SQL based DB that stores User’s activities. Since there are millions of activities are stored in DB, doing aggregation on the fly will be very expensive so we are thinking of pushing\replicating these activities into DynamoDB and use dynamo DB streams (events) + Lamda to provide real-time aggregation.

This should work if types of aggregation that need to be done are fixed from the beginning. In our case, we want to keep adding new aggregation in the future based on new uses cases. Although I am not sure how can I regenerated those event streams as it won’t be available after 24 hours!

Can anyone explain to me how can we add more aggregation on the go after the initial setup?

apdev
  • 115
  • 2
  • 10

1 Answers1

1

If you can take the system offline for an outage you could stop all writes, then perform a scan, calculate your aggregation, and then turn everything back on again with the streams which will be updating your aggregation.

If you cannot take the table offline you probably will have to do something more fancy, such as cloning the table by taking a snapshot from when you enable your streams, and then calculate the value of the aggregation for your data up until the point at which you enabled streams.

Depending on the calculations you are performing this could be very cumbersome, but I don't think there is another way around it. I would also love their to be a native aggregation framework for DynamoDB.

Derrops
  • 7,651
  • 5
  • 30
  • 60