19

Generally, concurrent collections are safe to iterate; according to Javadoc: 'Iterators are weakly consistent, returning elements reflecting the state of the set at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations.' However, consider this:

import java.util.Random;
import java.util.TreeSet;
import java.util.concurrent.ConcurrentSkipListSet;

public class ConcurrencyProblem {
    private static volatile boolean modifierIsAlive = true;

    public static void main(String[] args) {
        final ConcurrentSkipListSet<Integer> concurrentSet = new ConcurrentSkipListSet<>();
        Thread modifier = new Thread() {
            private final Random randomGenerator = new Random();

            public void run() {

                while (modifierIsAlive) {
                    concurrentSet.add(randomGenerator.nextInt(1000));
                    concurrentSet.remove(randomGenerator.nextInt(1000));
                }
            };
        };
        modifier.start();
        int sum = 0;
        while (modifierIsAlive) {
            try {
                TreeSet<Integer> sortedCopy = new TreeSet<>(concurrentSet);
                // make sure the copy operation is not eliminated by the compiler
                sum += sortedCopy.size();
            } catch (RuntimeException rte) {
                modifierIsAlive = false;
                rte.printStackTrace();
            }
        }
        System.out.println("Dummy output: " + sum);
    }
}

The output is

java.util.NoSuchElementException
at java.util.concurrent.ConcurrentSkipListMap$Iter.advance(ConcurrentSkipListMap.java:2299)
at java.util.concurrent.ConcurrentSkipListMap$KeyIterator.next(ConcurrentSkipListMap.java:2334)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2559)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2547)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2504)
at java.util.TreeMap.addAllForTreeSet(TreeMap.java:2462)
at java.util.TreeSet.addAll(TreeSet.java:308)
at java.util.TreeSet.<init>(TreeSet.java:172)
at mtbug.ConcurrencyProblem.main(ConcurrencyProblem.java:27)
Dummy output: 44910

I'm wondering if this is a bug or a feature; we did not get a ConcurrentModificationException, but still, having to care about iteration (falling back to synchronized blocks or otherwise) kind of defeats the purpose of ConcurrentSkipListSet/Map. I've been able to reproduce this both with Java 7 and 8 (currently, 8u72 on my Linux box).

Stuart Marks
  • 127,867
  • 37
  • 205
  • 259
Kofa
  • 191
  • 4
  • Per Javadoc, the iterators and spliterators returned by the `ConcurrentSkipListSet` are all [weakly consistent](http://stackoverflow.com/questions/20142493/fail-safe-iterators-and-weakly-consistent-iterators). – Kedar Mhaswade Feb 28 '16 at 14:52

2 Answers2

6

As far as I can understand from browsing the sources, the problem with TreeSet is that it calls size() before iterating and then uses it instead of calling hasNext(). This may be a bug, but I think it's just a consequence of red-black trees being complicated structures requiring careful balancing, and therefore knowing size in advance is needed to properly balance it in linear time during creation.

You may circumvent this by iterating manually and adding elements to the TreeSet, but this will lead to n log n complexity, which could be the reason why TreeSet's constructor doesn't do it that way (its API spec guarantees linear time). Of course it could still call hasNext() as it builds the tree, but then some additional actions may be required after the construction is finished to rebalance the tree, which could lead to amortized linear complexity. But red-black trees are a mess as they are, and that kind of hack would make the implementation even messier.

Still, I think it's very confusing and should probably be documented somewhere in the API docs, but I'm not sure where exactly. Probably in the part where they explain what weakly consistent iterators are. Specifically, it should be mentioned that some library classes rely on the returned size and therefore may throw NoSuchElementException. Mentioning specific classes would also help.

Sergei Tachenov
  • 24,345
  • 8
  • 57
  • 73
  • 1
    Arguably the appropriate place for such documentation would be the `TreeSet` constructor, noting that it does not rely solely on the iterator. – kdgregory Feb 28 '16 at 15:31
  • @kdgregory, I thought about that, but then it's a class that has nothing to do with concurrency, and someone looking for possible concurrency issues would probably look at the concurrency's docs, just thinking “OK, so this thing is totally thread-safe, cool, I don't have to worry about anything” and not even look at TreeSet's docs. – Sergei Tachenov Feb 28 '16 at 15:37
3

I'm actually starting to lean towards this being a bug in TreeSet/TreeMap (update, it is). The issue, as Sergey alludes, is that TreeMap caches the result of ConcurrentSkipListSet.size() before reading its elements.

In other words, it assumes the Collection it is passed will not be modified during construction, which is an erroneous assumption.

Note that even if buildFromSorted() did call Iterator.hasNext() its only option at that point would be to fail, since the backing data structure was modified mid-construction.

Looking at other collections that could potentially have some issue copying concurrent structures, including ArrayList, LinkedList, and CopyOnWriteArrayList (most other collections I looked at simply for-each over the elements), explicitly copy the provided collection to an array before doing any actual work in order to avoid this exact issue. I think TreeSet and TreeMap should be doing the same thing.

We actually don't have to accept O(n log n) performance due to this bug, but it's going to be a hack. We can't simply copy the values into an array or other data structure, because then inserting into the TreeSet won't be linear time. But we can lie to TreeSet by claiming the copy is a SortedSet.

public static class IterateOnlySortedSet<E>
    extends AbstractSet<E> implements SortedSet<E> {
  private final ArrayList<E> elements;
  private final Comparator<? super E> comparator;

  public IterateOnlySortedSet(SortedSet<E> source) {
    elements = new ArrayList<>(source);
    comparator = source.comparator();
  }

  @Override
  public Iterator<E> iterator() {
    return elements.iterator();
  }

  @Override
  public int size() {
    return elements.size();
  }

  @Override
  public Comparator<? super E> comparator() {
    return comparator;
  }

  // remaining methods simply throw UnsupportedOperationException
}

Changing your TreeSet construction line to:

TreeSet<Integer> sortedCopy = new TreeSet<>(new IterateOnlySortedSet<>(concurrentSet));

Now succeeds.

Nice find :)

dimo414
  • 47,227
  • 18
  • 148
  • 244