2

I have a large quantity of things, a thread that repeatedly iterates over them, and a separate thread that occasionally removes or adds individual things. The things are in a synchronized linked list:

private List<Thing> things = Collections.synchronizedList(new LinkedList<Thing>());

My first thread iterates over them:

while (true) {
    for (Thing thing : things) {
        thing.costlyOperation();
    }
    // ...
}

When my second thread adds or removes things, the above iterator blows up with a ConcurrentModificationException. However, it seems that removal and addition are permissible in my above case. The constraints are that (A) the costlyOperation method must not be called when a Thing is not present in the list, and (B) should be called promptly (on the current or the next iteration) when a Thing is present, and (C) additions and deletions via that second thread must complete quickly, and not wait.

I might synchronize on things before I enter my iteration loop, but this blocks the other thread for too long. Alternatively, I can modify Thing to include an isDead flag, and perform a nice quick iterator.remove() loop to get rid of dead things before the loop above, wherein I'd check the flag again. But I want to solve this problem near this for loop, with minimal pollution. I've also considered creating a copy of the list (a form of "fail-safe iteration"), but that seems to violate constraint A.

Will I need to code up my own collection and iterator? From scratch? Am I neglecting to see how the CME is actually a useful exception even in this case? Is there a data structure, strategy, or pattern that I could use, instead of that default iterator, to solve this problem?

mk.
  • 11,360
  • 6
  • 40
  • 54
  • 1
    How about a CopyOnWriteArrayList? http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CopyOnWriteArrayList.html – copeg Mar 24 '15 at 18:32
  • Would a producer/consumer model work for your requrements? Then you could avoid iterating at all, avoid the delays from running `costlyOperation()` calls in sequence, and remove items from the producer queue prior to execution if they become dead. – Kevin Condon Mar 24 '15 at 19:11
  • Your constraint A basically requires full synchronization on the list to work properly; depending on when in costlyOperation the "`thing` must be in `things`" check is, nothing else can really guarantee it. @KevinCondon has the best solution if you can relax that slightly, IMO. – Sbodd Mar 24 '15 at 19:15
  • @KevinCondon It might, but consumed items need to be re-inserted, which might be messy because it seems like you'd end up juggling two lists, and the second thread might add or remove a thing repeatedly, so it's hard to say. – mk. Mar 24 '15 at 19:27
  • @Sbodd yes but putting the "full synchronization" inside the loop as in yshavit's answer (rather than outside the whole loop) seems to satisfy (A), no? – mk. Mar 24 '15 at 19:27
  • Yes, if holding a lock for a single costlyOperation doesn't violate your constraint C, then yshavit's answer looks like it provides a good strict guarantee of A. – Sbodd Mar 24 '15 at 19:54
  • I added a second answer that uses an Executor to implement a producer/consumer model for this problem. – Kevin Condon Mar 24 '15 at 21:06

4 Answers4

3

You can get the CME when an iterator's underlying structure is modified while the iterator is being used -- regardless of any concurrency. In fact, it's quite easy to get one just on a single thread:

List<String> strings = new ArrayList<String>();
Iterator<String> stringsIter = strings.iterator();
strings.add("hello"); // modify the collection
stringsIter.next();   // CME thrown!

The exact semantics are up to the collection, but usually, the CME comes about any time you modify the collection after the iterator's been created. (That's assuming the iterator doesn't specifically allow concurrent access, of course, as some of the collections in java.util.concurrent do).

The naive solution is to synchronize over the whole while (true) loop, but of course you don't want to do that, because then you've got a lock over a bunch of expensive operations.

Instead, you should probably copy the collection (under a lock), and then operate on the copy. In order to make sure that the thing is still in things, you could double-check it within the loop:

List<Thing> thingsCopy;
synchronized (things) {
    thingsCopy = new ArrayList<Thing>(things);
}
for (Thing thing : thingsCopy) {
    synchronized (things) {
        if (things.contains(thing)) {
            thing.costlyOperation();
        }
    }
}

(You could also use one of the aforementioned java.util.concurrent collections that allow modifications while iterating, such as CopyOnWriteArrayList.)

Of course, now you still have the synchronized block around costlyOperation. You could move the contains check alone into the synchronized block:

boolean stillInThings;
synchronized (things) {
    stillInThings = things.contains(thing);
}

... but that only lowers the raceyness, it doesn't eliminate it. Whether that's good enough for you is up to your application's semantics. Generally, if you want the costly operation to only take place while the thing is in things, you'll need to put some sort of lock around it.

If that's the case, you may want to look at using a ReadWriteLock instead of synchronized blocks. It's a bit more dangerous (in that the language lets you get more things wrong, if you're not careful to always release the lock in a finally block), but could be worth it. The basic pattern would be to make the thingsCopy and double-check work under a reader lock, while modifications to the list happen under a writer lock.

yshavit
  • 42,327
  • 7
  • 87
  • 124
  • This is roughly the "fail-safe iteration" I'd mentioned - the problem was (besides copy seeming unnecessarily slow) that I might end up calling `costlyOperation` on a Thing that is in `thingsCopy`, but which is no longer in `things`, which violates constraint A. Could I work around that somehow? – mk. Mar 24 '15 at 18:41
2

You must synchronize when iterating

From the JavaDoc:

It is imperative that the user manually synchronize on the returned list when iterating over it:

List list = Collections.synchronizedList(new ArrayList());
...
synchronized (list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}

Failure to follow this advice may result in non-deterministic behavior.

Your conditions are not enough

You are using a linked list. Thus you have a structure similar to this:

class ListEntry{
    Thing data;
    ListEntry next;
}

When you iterate over things you pick one entry from the list and then you do something with the data. If this list entry is deleted while you're iterating, the iteration will terminate premature if next is set null or it may do something entirely different like iterate onto another list if the entry is recycled.

You need to devise a proper synchronization, your idea here of allowing concurrent modification and use of the data without synchronization is a recipe for disaster.

Solution Proposal

Add two lists pendingremoval and pendingaddition for which your background thread can add requests to. When your processing thread has finished one loop through the container, swap the pending* lists for empty new ones and process the removals in your processing thread synchronously.

Here is a really sloppy example to show you the idea:

public class Processor {

    private List<Thing> pendingRemoval;
    private List<Thing> pendingAddition;
    private List<Thing> things;

    public void add(Thing aThing) {
        pendingAddition.add(aThing);
    }

    public void remove(Thing aThing) {
        pendingRemoval.add(aThing);
    }

    public void run() {
        while (true) {
            for (Thing thing : things) {
                if (!pendingRemoval.contains(thing)) {
                    thing.costlyOperation();
                }
            }

            synchronized (pendingRemoval) {
                things.removeAll(pendingRemoval);
                pendingRemoval.clear();
            }

            synchronized (pendingAddition) {
                things.addAll(pendingAddition);
                pendingAddition.clear();
            }
        }
    }
}

Edit: Forgot the condition of not processing removed things

Edit2: In response to comments:

public class Processor {
    private Map<Thing, Integer> operations = new HashMap<Thing, Integer>();
    private List<Thing> things;

    public void add(Thing aThing) {
        synchronized (operations) {
            Integer multiplicity = operations.get(aThing);
            if (null == multiplicity) {
                multiplicity = 0;
            }
            operations.put(aThing, multiplicity + 1);
        }
    }

    public void remove(Thing aThing) {
        synchronized (operations) {
            Integer multiplicity = operations.get(aThing);
            if (null == multiplicity) {
                multiplicity = 0;
            }
            operations.put(aThing, multiplicity - 1);
        }
    }

    public void run() {
        while (true) {
            for (Thing thing : things) {
                Integer multiplicity;
                synchronized (operations) { 
                    multiplicity = operations.get(thing);
                }
                if (null == multiplicity || multiplicity > 0) {
                    thing.costlyOperation();
                }
            }

            synchronized (operations) {
                for (Map.Entry<Thing, Integer> operation : operations.entrySet()) {
                    int multiplicity = operation.getValue();
                    while(multiplicity<0){
                        things.remove(operation.getKey());
                    }
                    while(multiplicity>0){
                        things.add(operation.getKey());
                    }
                }
                operations.clear();
            }
        }
    }
}
Emily L.
  • 5,673
  • 2
  • 40
  • 60
  • The solution proposal is close to what I would suggest (one could consider a `Set` instead of a `List`, to have O(1) for the `contains` call, but these are details and depend on the exact application case). I think it's the only solution so far that does not involve copying, and is generically applicable (because it not does not require modifications of the "Thing" class) – Marco13 Mar 24 '15 at 19:32
  • If a thing is added then immediately removed, `things` will contain a Thing that shouldn't be there. How could I ensure this (or a similar scenario) doesn't happen? – mk. Mar 24 '15 at 20:31
  • @mk see the edit for a really primitive approach that should work. You can also implement a command class with addition/removal flag and the thing to add/remove. Then have a queue of these commands and process them in a suitable manner inside of the processing loop. – Emily L. Mar 24 '15 at 22:16
1

You could use something like ConcurrentLinkedQueue. It will however be possible to end up executing thing.costlyOperation() on an element that was removed from the list in another thread due to the weakly consistent iterator. If you can add a thread-safe dead property to the Thing class that short-circuits the action in costlyOperation(), then you could avoid the costly operation even with the weakly consistent iterator. Just set the element's property when removing it from the list.

Remove from list:

Thing thing = ...;
thing.setDead(true);
things.remove(thing);

Check in Thing:

private volatile boolean dead = false;

public void costlyOperation() {
    if (dead) { return; }
    // do costly stuff...
}
Kevin Condon
  • 1,618
  • 10
  • 16
1

Take advantage of Java's Executor implementations and use a producer/consumer model. Instead of adding to a list that you have to iterate, submit costlyOperation() to the executor in whatever thread is currently adding to a list.

private final Executor exec = ...;

private void methodThatAddsThings(final Thing t) {
    exec.execute(new Runnable() {
        public void run() {
          t.costlyOperation();
        }
    });
}

This seems to satisfy A, B, and C with no need to share a list or synchronize anything. You also now have options for scaling by bounding the work queue or limiting the number of processing threads.

If you need the ability to stop a pending costly operation, the thread that removes Things can use an ExecutorService instead of Executor. Calling submit() will return a Future that you can keep in a map. When removing a Thing, look up its Future in the map and call the Future's cancel() to prevent it from running.

Kevin Condon
  • 1,618
  • 10
  • 16