So, Apache Storm + Trident provide the exactly-once semantics. Imagine I have the following topology:
TridentSpout -> SumMoneyBolt -> SaveMoneyBolt -> Persistent Storage.
CalculateMoneyBolt
sums monetary values in memory, then passes the result to SaveMoneyBolt
which should save the final value to a remote storage/database.
Now it is very important that we calculate these values and store only once to the database. We do not want to accidentally double count the money.
So how does Storm with Trident handle network partitioning and/or failure scenarios when the write request to the database has been successfully sent, the database has successfully received the request, logged the transaction, and while responding to the client, the SaveMoneyBolt
has either died or partitioned from the network before having received the database response?
I assume that if SaveMoneyBolt
had died, Trident would retry the batch, but we cannot afford double counting.
How are such scenarios handled?
Thanks.