In my Spring Boot app, customers can submit files. Each customer's files are merged together by a scheduled task that runs every minute. The fact that the merging is performed by a scheduler has a number of drawbacks, e.g. it's difficult to write end-to-end tests, because in the test you have to wait for the scheduler to run before retrieving the result of the merge.
Because of this, I would like to use an event-based approach instead, i.e.
- Customer submits a file
- An event is published that contains this customer's ID
- The merging service listens for these events and performs a merge operation for the customer in the event object
This would have the advantage of triggering the merge operation immediately after there is a file available to merge.
However, there are a number of problems with this approach which I would like some help with
Concurrency
The merging is a reasonably expensive operation. It can take up to 20 seconds, depending on how many files are involved. Therefore the merging will have to happen asynchronously, i.e. not as part of the same thread which publishes the merge event. Also, I don't want to perform multiple merge operations for the same customer concurrently in order to avoid the following scenario
- Customer1 saves file2 triggering a merge operation2 for file1 and file2
- A very short time later, customer1 saves file3 triggering merge operation3 for file1, file2, and file3
- Merge operation3 completes saving merge-file3
- Merge operation2 completes overwriting merge-file3 with merge-file2
To avoid this, I plan to process merge operations for the same customer in sequence using locks in the event listener, e.g.
@Component
public class MergeEventListener implements ApplicationListener<MergeEvent> {
private final ConcurrentMap<String, Lock> customerLocks = new ConcurrentHashMap<>();
@Override
public void onApplicationEvent(MergeEvent event) {
var customerId = event.getCustomerId();
var customerLock = customerLocks.computeIfAbsent(customerId, key -> new ReentrantLock());
customerLock.lock();
mergeFileForCustomer(customerId);
customerLock.unlock();
}
private void mergeFileForCustomer(String customerId) {
// implementation omitted
}
}
Fault-Tolerance
How do I recover if for example the application shuts down in the middle of a merge operation or an error occurs during a merge operation?
One of the advantages of the scheduled approach is that it contains an implicit retry mechanism, because every time it runs it looks for customers with unmerged files.
Summary
I suspect my proposed solution may be re-implementing (badly) an existing technology for this type of problem, e.g. JMS. Is my proposed solution advisable, or should I use something like JMS instead? The application is hosted on Azure, so I can use any services it offers.
If my solution is advisable, how should I deal with fault-tolerance?