14

Recently, I've been surprised by the fact that some Java collections don't have constant time operation of method size().

While I learned that concurrent implementations of collections made some compromises as a tradeoff for gain in concurrency (size being O(n) in ConcurrentLinkedQueue, ConcurrentSkipListSet, LinkedTransferQueue, etc.) good news is that this is properly documented in API documentation.

What concerned me is the performance of method size on views returned by some collections' methods. For example, TreeSet.tailSet returns a view of the portion of backing set whose elements are greater than or equal to fromElement. What surprised me a lot is that calling size on returned SortedSet is linear in time, that is O(n). At least that is what I managed to dig up from the source code of OpenJDK: In TreeSet is implemented as wrapper over TreeMap, and within a TreeMap, there is EntrySetView class whose size method is as follows:

abstract class EntrySetView extends AbstractSet<Map.Entry<K,V>> {
    private transient int size = -1, sizeModCount;

    public int size() {
        if (fromStart && toEnd)
            return m.size();
        if (size == -1 || sizeModCount != m.modCount) {
            sizeModCount = m.modCount;
            size = 0;
            Iterator i = iterator();
            while (i.hasNext()) {
                size++;
                i.next();
            }
        }
        return size;
    }

    ....
}

This means that first time size is called is O(n) and then it's cached as long as backing map is not modified. I was not able to find this fact in the API documentation. More efficient implementation would be O(log n) with memory tradeoff in caching of subtree sizes. Since such tradeoffs are being made for avoiding code duplication (TreeSet as wrapper over TreeMap), I don't see a reason why they should not be made for performance reasons.

Disregarding me being right or wrong with my (very brief) analysis of the OpenJDK implementation of TreeSet, I would like to know is there a detailed and complete documentation on performance of many such operations especially ones which are completely unexpected?

mario
  • 191
  • 1
  • 6
  • I would think that the Javadocs include that information. – Thilo Mar 29 '13 at 12:21
  • Interesting question (+1). I can't help on the documentation front (I've not seen the complexity explicitly documented). However, I personally find `tailSet()`'s behaviour perfectly intuitive. I think it would be more surprising to expect everyone to pay a memory penalty so that a marginal use case would have better performance. – NPE Mar 29 '13 at 12:31
  • @NPE Do you agree with memory penalty that we all have from having every Set implemented as a wrapper for Map just for JDK developers not having to implement same features twice? :) I think I made it rather clear that my issue is with documentation and not performance itself. What confused me is that TreeMap.size is O(1), TreeMap.tailSet is O(log N) and there is no information for TreeMap.tailSet().size() and I know it can be O(log n) and O(n). – mario Mar 29 '13 at 12:41

1 Answers1

3

For example, TreeSet.tailSet returns a view of the portion of backing set whose elements are greater than or equal to fromElement. What surprised me a lot is that calling size on returned SortedSet is linear in time, that is O(n).

To me it is not surprising. Consider this sentence from the javadoc:

"The returned set is backed by this set, so changes in the returned set are reflected in this set, and vice-versa."

Since the tail set is a dynamic view of the backing set, it follows that its size has to be calculated dynamically in practice. The alternative would require that when a change was made to the backing set, it would have to adjust the sizes of all extant tailset (and headset) views. That would make updates to the backing set more expensive, AND it would present a storage management problem. (In order to update the view sizes, the backing set would need references to all existing view sets ... and that is a potential hidden memory leak.)

Now you do have a point regarding the documentation. But in fact, the javadocs says nothing about the complexity of the view collections. And, indeed, it doesn't even document that TreeSet.size() is O(1)! In fact, it only documents the complexity of the add, remove and contains operations.


I would like to know is there a detailed and complete documentation on performance of many such operations especially ones which are completely unexpected?

AFAIK, No. Certainly, not from Sun / Oracle ...

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • I do understand all that but the fact is that there are different approaches that all have different tradeoffs and it's not documented anywhere which approach is chosen. – mario Mar 29 '13 at 13:25
  • @mario - yes, but I'm not the correct person to complain to. (And, yes, from where I'm sitting ... you ARE complaining.) – Stephen C Mar 29 '13 at 14:52