tl;dr
The easiest way to implement asynchronous processing is to adopt the trasaction approach where the process is split into a Request and a separate method to check the State or status of the transaction that was created in this initial Request.
The API responds to a request with a token that represents the transaction. That request will also spawn off any background processing or message queue logic.
The client must call the API to check the State (status) of the transaction and retrieve the result when/if is is ready.
Common patterns in async transactions may involve other method calls to retrieve processing logs, or to cancel the pending request.
This Transaction model is most simply experienced in online financial transactions through 3rd party payment providers.
source: https://multithreaded.stitchfix.com/blog/2017/05/09/patterns-of-soa-asynchronous-transaction/
Original Post:
One solution, if the per-row processing can be written into batches or can be executed in parallel is to recognise that there are multiple levels of Tasks here:
- the Export Request Task
- individual data preparation activities
- the Export Processing or Result task
- Notify the user that there is a response ready
- or user can poll for state on the original Request
Depending on the frequency and concurrency of requests, as in how many of these Export Requests need to be serviced at the same time, you could use 2 queues, one to process individual records, and another to aggregate the results.
This post Patterns of SOA: Asynchronous Transaction has some useful diagrams and background on the concept
The call to step 1 should create a Transaction Header record for this task in a table or some other persistence storage. This header might include a count of all the records (or batches), a zero processed record counter and a primary reference key.
Then the API will add a message into a queue for each record (or batch) that needs to be exported, this message should include a reference to the header
- The response from the API should be (or include) the reference to this header.
The queue processes each record as required, then updates the header to reflect the process count.
- If your queuing technology does not support aggregation of results, you will also need to store the result for this single processed record somewhere, perhaps linked or attached to the transaction header
If the process count indicates that all records are completed, a message should be added to an additional queue to process the batch of results.
This queue will aggregate the results of the individual items and prepare the final output, this will in turn be linked or attached to the transaction header and the state updated in this header record.
- Use push methodology (and or email) to inform the client that the export request has been processed and is ready to be received.
This methodology means that we do not have to poll to determine completeness, however it does require a storage medium and will involve a lookup for the header record for each process. It also requires your queue handler to be able to aggregate results or for your queue processor to store the results somewhere.
The reason we do not try to make the first queue processor also manage the final step of combining the results and sending the export is that it has a different resource utilisation profile and may itself fail and need to be re-tried. We move that workload to it's own queue to take this into consideration.
The last step will be to notify the user that the export file is ready, if you do not have a PUSH mechanism, the client could periodically lookup (poll) the header to determine the state. In general a lookup query to check the state of a flag can easily be optimised to have minimal overall impact if PUSH is not an option.
Apart from the traditional shopping cart 3rd party processing model you might have experienced, another commercial example of this is in MS Office 365 Exchange Mail Trace report exports.
https://learn.microsoft.com/en-us/exchange/monitoring/trace-an-email-message/run-a-message-trace-and-view-results
The user creates a request, and can specify an email to receive the fulfilment notification. When the export is ready an email will be sent, when the page is refreshed the current state of the request is refreshed, if the request is fulfilled, the user can download the export file.
- A nice touch is that the notification email is itself on a delay queue, so if the user accessed the file through the UI, the notification email is not sent.
This might be overkill, but is designed for extreme scale-out. Depending on your technology stack and chosen vendor, there may be OOTB solutions that abstract a lot of this away from you.
MS Azure Durable Functions provides a reliable way to abstract aways all these queue and message processing, aggregation and trigger concepts into a single manageable workflow, it is designed specifically to simplify the code required to achieve a high level of scalability for problems like this.