I'm currently working on a prototype in C# that utilises CQRS and event sourcing and I've hit a performance bottleneck in my projections to an SQL database.
My first prototype was built with Entity Framework 6, code first. This choice was made primarily to get going and because the read side would benefit from LINQ.
Every (applicable) event is consumed by multiple projections, which either create or update the corresponding entity.
Such a projection currently look like this:
public async Task HandleAsync(ItemPlacedIntoStock @event)
{
var bookingList = new BookingList();
bookingList.Date = @event.Date;
bookingList.DeltaItemQuantity = @event.Quantity;
bookingList.IncomingItemQuantity = @event.Quantity;
bookingList.OutgoingItemQuantity = 0;
bookingList.Item = @event.Item;
bookingList.Location = @event.Location;
bookingList.Warehouse = @event.Warehouse;
using (var repository = new BookingListRepository())
{
repository.Add(bookingList);
await repository.Save();
}
}
This isn't very well performing, most likely for the reason that I call DbContext.SaveChanges()
in the IRepository.Save()
method. One for each event.
What options should I explore next? I don't want to spent days chasing ideas that might prove to be only marginally better.
I currently see the following options:
- Stick with EF, but batch process the events (i.e. new/save context every X number of events) as long as the projection is running behind.
- Try to do more low-level SQL, for example with ADO.NET.
- Don't use SQL to store the projections (i.e. use NoSQL)
I expect to see millions of events because we plan to source a large legacy application and migrate data in the form of events. New projections will also be added often enough so the processing speed is an actual issue.
Benchmarks:
- The current solution (EF, save after every event) processes ~200 events per second (per projection). It does not scale directly with the number of active projections (i.e. N projections process less than N * 200 events/second).
- When the projections aren't saving the context, the number of events/second increases marginally (less than double)
- When the projections don't do anything (single return statement), the processing speed of my prototype pipeline is ~30.000 events/second globally
Updated benchmarks
- Single-threaded inserts via ADO.NET
TableAdapter
(newDataSet
and newTableAdapter
on each iteration): ~2.500 inserts/second. Did not test with projection pipeline but standalone - Single-threaded inserts via ADO.NET
TableAdapter
that does notSELECT
after inserting: ~3.000 inserts/second- Single-threaded ADO.NET
TableAdapter
batch-insert of 10.000 rows (single dataset, 10.000 rows in-memory): >10.000 inserts/second (my sample size and window was too small)
- Single-threaded ADO.NET