4

I would like to have an iterator that can be read by multiple threads concurrently so that I can process the data of the iterator's source in parallel. The challenge is that I can't really couple hasNext() with its logical next() as those could go to different threads. (That is, two threads can call hasNext(), each see true, and then have the second thread fail because there was only one item.) My problem is that for some sources I don't really know if it has a next element until I try to read it. One such example is reading lines from a file; another is reading Term instances from a Lucene index.

I was thinking of setting up a queue inside the iterator and feeding the queue with a separate thread. That way, hasNext() is implemented in terms of the queue size. But I don't see how I could guarantee that the queue is filled because that thread could get starved.

Should I ignore the Iterator contract, and just call next() exhaustively until a NoSuchElementException is thrown?

Is there a more elegant way of handling the problem?

Gene Golovchinsky
  • 6,101
  • 7
  • 53
  • 81

5 Answers5

7

Can your threads just pull from a BlockingQueue instead of an Iterator. As you have discovered, Iterators are not well suited for concurrent access.

Pass a LinkedBlockingQueue, and have your threads do queue.poll() until nothing is left.

sbridges
  • 24,960
  • 4
  • 64
  • 71
  • Thanks for the quick response. This makes sense, but I need to fill the queue in a way to make sure it's not starved, right? I cannot put all elements into the queue because there are potentially too many to fit into memory. I suppose that in addition to the queue I would have to have an `AtomicBoolean` that indicates that there no more records will be added to the queue. – Gene Golovchinsky May 05 '11 at 05:53
  • You can have your threads do a queue.take(), and have some sort of poison pill (http://www.javaspecialists.eu/archive/Issue016.html) to signal that there is nothing left to do. – sbridges May 05 '11 at 06:00
  • Wouldn't this still have the potential for more than one thread calling `take()` and having one of them get the poison pill, while the second one hangs on an empty queue? This implies that the `take()` must be done in the main thread that then dispatches the values to the worker threads. Or am I missing something? – Gene Golovchinsky May 05 '11 at 06:32
  • If you have N threads, put N poison pills on the queue. Something like an ExecutorService might serve your needs better as well. – sbridges May 05 '11 at 13:42
  • I came up with a better solution than interrupting threads. When a thread consumes the poison pill from the queue, it sets an AtomicBoolean telling the launching thread to stop creating new jobs. It then puts the poison pill back on the queue, so that any other job that is waiting for completion will terminate gracefully. When I get the chance, I will edit my original question to add the code that does this. – Gene Golovchinsky May 05 '11 at 18:58
1

One workaround / escape comes to my mind, to keep (most of) the contract and avoid NoSuchElementExceptions: The iterator.next() could return a custom "End-of-iteration" marker object, that can be processed but is nothing but a dummy. So if one thread receives a true for hasNext() but another thread already grabbed the last item, then the first thread will get a dummy (instead of an exception).

You should be able to use this kind of iterator in all normal use cases and single threaded uses should even notice the difference. Should be useable with the enhanced for loop too.

It will only fail if one tries to wait for NoSuchElementException instead of checking hasNext(), because that exception will not be thrown because of the dummy items.

Andreas Dolk
  • 113,398
  • 19
  • 180
  • 268
  • Yeah, I use that with a queue. What I do in this case is when the dummy comes up, the worker thread ignores the job but puts the dummy back in the queue. That way all the workers eventually get it. – Gene Golovchinsky May 07 '11 at 07:11
0

The chosen answer will work, but it introduces complexity and potential unnecessary buffering. Why not ignore Iterator’s contract and write your own:

public interface ConcurrentIterator<T> {

    T next() throws EndOfIterationException;

}

This will be thread-safe if your implementation is. Can even wrap an Iterator in it.

alexantd
  • 3,543
  • 3
  • 27
  • 41
0

As an up-to-date answer I think you should use a ConcurrentLinkedQueue which is available since Java 1.5.

maxauthority
  • 111
  • 2
-1

I could be missing the point, but couldn't you use a synchronized block in this situation?

synchronized(iterator)
{
    if (iterator.hasNext()) element = iterator.next();
}

Here, when one thread is using the iterator, no other threads will be able to access it.

elekwent
  • 763
  • 5
  • 10