4

I have a data file with thousand of rows. I am reading them and saving them in the database. I want to multi-thread this process in the batches of say 50 rows. As I am read in the file, 10 rows are submitted to an ExecutorService.

ExecutorService executor = Executors.newFixedThreadPool(5);`

I can do below in a in a while loop till my rows end....

 Future<Integer> future = executor.submit(callableObjectThatSaves10RowsAtOneTime);

But, I don't want to read the entire file into memory if the processing 10 rows is taking time. I only want to submit 5 till one of the threads return, then I submit the next.

Let's say a thread takes 20 seconds to save the 10 records, I don't want the ExecutorService to be fed thousand of lines since the reading process is continuing to read and submit to ExecutorService

What is the best way to achieve this?

Anthony Dito
  • 3,610
  • 3
  • 29
  • 56
Giovanny
  • 45
  • 1
  • 5
  • Possible duplicate of http://stackoverflow.com/questions/1250643/how-to-wait-for-all-threads-to-finish-using-executorservice – Cratylus Oct 28 '15 at 21:41
  • @Cratylus this is definitely not a duplicate of the question you linked. The OP is asking how to throttle the number of tasks submitted to avoid having to read a huge file all at once, not how to know when all tasks are completed. – CodeBlind Oct 28 '15 at 22:12

1 Answers1

3

You can do this with a LinkedList<Future<?>> that stores futures until you've reached some pre-determined size. Here's some skeleton code that should get you most of the way there:

int threads = 5;
ExecutorService service = Executors.newFixedThreadPool(threads);
LinkedList<Future<?>> futures = new LinkedList<>();

//As long as there are rows to save:
while(moreRowsLeft()){
    //dump another callable onto the queue:
    futures.addLast(service.submit(new RowSavingCallable());

    //if the queue is "full", wait for the next one to finish before
    //reading any more of the file:
    while(futures.size() >= 2*threads) futures.removeFirst().get();
}

//All rows have been submitted but some may still be writing to the DB:
for(Future<?> f : futures) future.get();

//All rows have been saved at this point

You may wonder why I've allowed the number of futures to reach twice the number of threads on the machine - this allows executor service threads to be working on database saves while the main thread is creating more work to be done. This can help hide any I/O cost associated with making more callables available for processing while the worker threads are busy doing the Database write.

CodeBlind
  • 4,519
  • 1
  • 24
  • 36
  • @CodeBlind- Thank you! I do have a question. Shouldn't we start removing from first element of the LinkList. The first one that was added to the LinkedList would have a higher chance of being returned first after completing the task? Can we potentially plug in ExecutorService.take() to your algorithm and optimize? – Giovanny Oct 29 '15 at 13:58
  • @Giovanny yes, you're right, a typo on my part :) I've fixed it. As for plugging in `ExecutorCompletionService.take()` - seems reasonable to me. You could just use a counter to keep track of how many you've submitted and call `take()` when you've exceeded some threshold. – CodeBlind Oct 29 '15 at 17:55
  • The last loop will call the 'get' on the Futures that we had already called 'get' on. This will not do any additional process right ? – Giovanny Nov 13 '15 at 15:10