I have a process which requires streaming data from a database and passing the records off to an external server for processing before returning the results to store back in the database.
Get database row from table A
Hand off to external server
Receive result
insert database row into table B
Currently this is a single-threaded operation, and the bottleneck is the external server process and so I would like to improve performance by using other instances of the external server process to handle requests.
Get 100 database rows from table A
For each row
Hand off to external server 1
Receive Result
insert database row into table B
In parallel get 100 database rows from table A
For each row
Hand off to external server 2
Receive Result
insert database row into table B
Problem 1 I have been investigating Java thread pools, and dispatching records to the external servers this way, however I'm not sure how to fetch records from the database as quickly as possible without the workers rejecting new tasks. Can this be done with thread pools? What architecture should be used to achieve this?
Problem 2 At present I have optimised the database inserts by using batch statements and only executing once 2000 records have been processed. Would it be possible to adopt a similar approach within the workers?
Any help in structuring a solution to this problem would be greatly appreciated.