1

Assume that I have a set of objects that need to be analyzed in two different ways, both of which take relatively long time and involve IO-calls, I am trying to figure out how/if I could go about optimizing this part of my software, especially utilizing the multiple processors (the machine i am sitting on for ex is a 8-core i7 which almost never goes above 10% load during execution).

I am quite new to parallel-programming or multi-threading (not sure what the right term is), so I have read some of the prior questions, particularly paying attention to highly voted and informative answers. I am also in the process of going through the Oracle/Sun tutorial on concurrency.

Here's what I thought out so far;

  • A thread-safe collection holds the objects to be analyzed
  • As soon as there are objects in the collection (they come a couple at a time from a series of queries), a thread per object is started
  • Each specific thread takes care of the initial pre-analysis preparations; and then calls on the analyses.
  • The two analyses are implemented as Runnables/Callables, and thus called on by the thread when necessary.

And my questions are:

  1. Is this a reasonable scheme, if not, how would you go about doing this?
  2. In order to make sure things don't get out of hand, should I implement a ThreadManager or some thing of that sort, which starts and stops threads, and re-distributes them when they are complete? For example, if i have 256 objects to be analyzed, and 16 threads in total, the ThreadManager assigns the first finished thread to the 17th object to be analyzed etc.
  3. Is there a dramatic difference between Runnable/Callable other than the fact that Callable can return a result? Otherwise should I try to implement my own interface, in that case why?

Thanks,

AJcodez
  • 31,780
  • 20
  • 84
  • 118
posdef
  • 6,498
  • 11
  • 46
  • 94

4 Answers4

3

Your idea is basically sound. However, rather than creating threads directly, or indirectly through some kind of ThreadManager of your own design, use an Executor from Java's concurrency package. It does everything you need, and other people have already taken the time to write and debug it. An executor manages a queue of tasks, so you don't need to worry about providing the threadsafe queue yourself either.

There's no difference between Callable and Runnable except that the former returns a value. Executors will handle both, and ready them the same.

It's not clear to me whether you're planning to make the preparation step a separate task to the analyses, or fold it into one of them, with that task spawning the other analysis task halfway through. I can't think of any reason to strongly prefer one to the other, but it's a choice you should think about.

Tom Anderson
  • 46,189
  • 17
  • 92
  • 133
  • Thanks for the tip on Executors, sounds like they might save a lot of time and nerve. By the way, I am not quite sure on what you mean with the last paragraph, could you care to elaborate a bit? – posdef Feb 09 '11 at 10:47
  • For each object to be processed, as I understand it, there are three bits of work to be done: preparation, the first kind of analysis, and the second kind of analysis. Do you want to have a callable for each, or do the preparation in the same callable as one of the analyses? Our have I misunderstood? – Tom Anderson Feb 09 '11 at 12:05
  • Well no, I can't say you misunderstood it, though I don't know if it's worth making preparation as a callable of it's own. I was essentially thinking something along the line of the following pseudocode: `Thread.run(){ prep; call1.run(); call2.run() }` – posdef Feb 09 '11 at 13:14
  • If you did that, the analyses would not run in parallel; is that what you want? – Tom Anderson Feb 09 '11 at 14:58
  • Well analyses 1 and 2 would not be in parallel, but object 1, ... object N would hopefully run in parallel. That was my initial idea. – posdef Feb 10 '11 at 09:32
  • Okay, then everything's okay. I had the idea that you also wanted to run the two analysis for each object in parallel. If not, then there's nothing else to worry about. – Tom Anderson Feb 10 '11 at 13:53
3
  1. You could use a BlockingQueue implementation to hold your objects and spawn your threads from there. This interface is based on the producer-consumer principle. The put() method will block if your queue is full until there is some more space and the take() method will block if the queue is empty until there are some objects again in the queue.

  2. An ExecutorService can help you manage your pool of threads.

  3. If you are awaiting a result from your spawned threads then Callable interface is a good idea to use since you can start the computation earlier and work in your code assuming the results in Future-s. As far as the differencies with the Runnable interface, from the Callable javadoc:

    The Callable interface is similar to Runnable, in that both are designed for classes whose instances are potentially executed by another thread. A Runnable, however, does not return a result and cannot throw a checked exception.

Some general things you need to consider in your quest for java concurrency:

  • Visibility is not coming by defacto. volatile, AtomicReference and other objects in the java.util.concurrent.atomic package are your friends.
  • You need to carefully ensure atomicity of compound actions using synchronization and locks.
dimitrisli
  • 20,895
  • 12
  • 59
  • 63
2

The Executors provides factory methods for creating thread pools. Specifically Executors#newFixedThreadPool(int nThreads) creates a thread pool with a fixed size that utilizes an unbounded queue. Also if a thread terminates due to a failure then a new thread will be replaced in its place. So in your specific example of 256 tasks and 16 threads you would call

 // create pool
ExecutorService threadPool = Executors.newFixedThreadPool(16);
// submit task.
Runnable task = new Runnable(){};;
threadPool.submit(task);

The important question is determining the proper number of threads for you thread pool. See if this helps Efficient Number of Threads

Community
  • 1
  • 1
richs
  • 4,699
  • 10
  • 43
  • 56
  • Thanks, it really gives me a good idea how to start. Especially the thread on deciding on the number of threads is a good read. Say, if I decide to go with `Callable` instead of `Runnable`, in order to return the results of the analysis, how would that work with the `ExecutorService` ? Any complications to be expected? – posdef Feb 10 '11 at 09:36
0

Sounds reasonable, but it's not as trivial to implement as it may seem. Maybe you should check the jsr166y project. That's probably the easiest solution to your problem.

proactif
  • 11,331
  • 1
  • 17
  • 11