5

I know, it seems like a stupid question, you would expect that the time complexity of size() on any collection would be O(1) - but I'm finding that an "optimization" in my code which requires a call to size() is actually slowing things down.

So, what is the time complexity of size() for Sets in Java?

My code is an implementation of a recursive algorithm to find the maximal cliques in a graph (not important). Basically, the optimization just checks whether two Sets have equal size (these Sets are being constructed either way), and allows for only one more recursive call (stopping the recursion after that) if they are equal in size.

Here is a (simplified) version of my code:

        private static void recursivelyFindMaximalCliques(Set<Integer> subGraph, Set<Integer> candidates, Set<Integer> notCandidates) {
    boolean noFurtherCliques = false;

    Iterator<Integer> candidateIterator = candidates.iterator();
    while (candidateIterator.hasNext()) {
        int nextCandidate = candidateIterator.next();
        candidateIterator.remove();
        subGraph.add(nextCandidate);

        Set<Integer> neighbors = getNeighbors(nextCandidate);
        Set<Integer> newCandidates = calculateIntersection(candidates, neighbors);
        Set<Integer> newNotCandidates = calculateIntersection(notCandidates, neighbors);

        //if (newCandidates.size() == candidates.size())
        //  noFurtherCliques = true;
        recursivelyFindMaximalCliques(subGraph, newCandidates, newNotCandidates);
        //if (noFurtherCliques)
        //  return;

        subGraph.set.remove(nextCandidate);
        notCandidates.set.add(nextCandidate);
    }
}

The lines I have commented out are the ones in question - you can see that they check if the sets newCandidates and candidates are the same size, and if they are, the recursion is only allowed to go one level deeper.

When the lines are uncommented, the code runs about 10% slower - this is true whether the sets used are HashSets, TreeSets, or LinkedHashSets. This makes no sense, since those lines ensure that there will be FEWER recursive calls.

The only thing I can assume is that the call to size() on the sets is taking a long time. Does calling size() on Sets in Java take longer than O(1)?

EDIT

Since some people have asked, here is calculateIntersection():

private static IntegerSet calculateIntersection(Set<Integer> setA, Set<Integer> setB) {
    if (setA.size() == 0 || setB.size() == 0)
        return new Set<Integer>();

    Set<Integer> intersection = new Set<Integer>(); //Replace this with TreeSet, HashSet, or LinkedHashSet, whichever is being used
    intersection.addAll(setA);
    intersection.retainAll(setB);
    return intersection;
}

SECOND EDIT Here is the full code, if you like. I hesitated to post it, since it's long and nasty, but people asked, so here it is:

public class CliqueFindingAlgorithm {

private static class IntegerSet {
    public Set<Integer> set = new TreeSet<Integer>();  //Or whatever Set is being used
}


private static ArrayList<IntegerSet> findMaximalCliques(UndirectedGraph graph) {
    ArrayList<IntegerSet> cliques = new ArrayList<IntegerSet>();
    IntegerSet subGraph = new IntegerSet();
    IntegerSet candidates = new IntegerSet();
    IntegerSet notCandidates = new IntegerSet();

    for (int vertex = 0; vertex < graph.getNumVertices(); vertex++) {
        candidates.set.add(vertex);
    }

    recursivelyFindMaximalCliques(cliques, graph, subGraph, candidates, notCandidates);
    return cliques;
}


private static void recursivelyFindMaximalCliques(ArrayList<IntegerSet> cliques, UndirectedGraph graph, 
        IntegerSet subGraph, IntegerSet candidates, IntegerSet notCandidates) {
    boolean noFurtherCliques = false;

    Iterator<Integer> candidateIterator = candidates.set.iterator();
    while (candidateIterator.hasNext()) {
        int nextCandidate = candidateIterator.next();
        candidateIterator.remove();
        subGraph.set.add(nextCandidate);

        IntegerSet neighbors = new IntegerSet();
        neighbors.set = graph.getNeighbors(nextCandidate);
        IntegerSet newCandidates = calculateIntersection(candidates, neighbors);
        IntegerSet newNotCandidates = calculateIntersection(notCandidates, neighbors);

        if (newCandidates.set.size() == candidates.set.size())
            noFurtherCliques = true;
        recursivelyFindMaximalCliques(cliques, graph, subGraph, newCandidates, newNotCandidates);
        if (noFurtherCliques)
            return;

        subGraph.set.remove(nextCandidate);
        notCandidates.set.add(nextCandidate);
    }

    if (notCandidates.set.isEmpty()) {
        IntegerSet clique = new IntegerSet();
        clique.set.addAll(subGraph.set);
        cliques.add(clique);
    }
}


private static IntegerSet calculateIntersection(IntegerSet setA, IntegerSet setB) {
    if (setA.set.size() == 0 || setB.set.size() == 0)
        return new IntegerSet();

    IntegerSet intersection = new IntegerSet();
    intersection.set.addAll(setA.set);
    intersection.set.retainAll(setB.set);
    return intersection;
}

}

public class UndirectedGraph {

// ------------------------------ PRIVATE VARIABLES ------------------------------

private ArrayList<TreeMap<Integer, Double>> neighborLists;
private int numEdges;


// ------------------------------ CONSTANTS ------------------------------  

// ------------------------------ CONSTRUCTORS ------------------------------

public UndirectedGraph(int numVertices) {
    this.neighborLists = new ArrayList<TreeMap<Integer, Double>>(numVertices);
    this.numEdges = 0;
    for (int i = 0; i < numVertices; i++) {
        this.neighborLists.add(new TreeMap<Integer, Double>());
    }
}


// ------------------------------ PUBLIC METHODS ------------------------------


public void addEdge(int vertexA, int vertexB, double edgeWeight) {
    TreeMap<Integer, Double> vertexANeighbors = this.neighborLists.get(vertexA);
    TreeMap<Integer, Double> vertexBNeighbors = this.neighborLists.get(vertexB);

    vertexANeighbors.put(vertexB, edgeWeight);
    vertexBNeighbors.put(vertexA, edgeWeight);
    this.numEdges++;
}


public List<Integer> computeCommonNeighbors(int vertexA, int vertexB) {
    List<Integer> commonNeighbors = new ArrayList<Integer>();
    Iterator<Integer> iteratorA = this.getNeighbors(vertexA).iterator();
    Iterator<Integer> iteratorB = this.getNeighbors(vertexB).iterator();

    if (iteratorA.hasNext() && iteratorB.hasNext()) {
        int nextNeighborA = iteratorA.next();
        int nextNeighborB = iteratorB.next();
        while(true) {

            if (nextNeighborA == nextNeighborB) {
                commonNeighbors.add(nextNeighborA);
                if (iteratorA.hasNext() && iteratorB.hasNext()) {
                    nextNeighborA = iteratorA.next();
                    nextNeighborB = iteratorB.next();
                }
                else
                    break;
            }

            else if (nextNeighborA < nextNeighborB) {
                if (iteratorA.hasNext())
                    nextNeighborA = iteratorA.next();
                else
                    break;
            }

            else if (nextNeighborB < nextNeighborA) {
                if (iteratorB.hasNext())
                    nextNeighborB = iteratorB.next();
                else
                    break;
            }
        }
    }

    return commonNeighbors;
}

// ------------------------------ PRIVATE METHODS ------------------------------

private class EdgeIterator implements Iterator<int[]> {

    private int vertex;
    private int[] nextPair;
    private Iterator<Integer> neighborIterator;

    public EdgeIterator() {
        this.vertex = 0;
        this.neighborIterator = neighborLists.get(0).keySet().iterator();
        this.getNextPair();
    }

    public boolean hasNext() {
        return this.nextPair != null;
    }

    public int[] next() {
        if (this.nextPair == null)
            throw new NoSuchElementException();
        int[] temp = this.nextPair;
        this.getNextPair();
        return temp;
    }

    public void remove() {
        throw new UnsupportedOperationException();
    }

    private void getNextPair() {
        this.nextPair = null;

        while (this.nextPair == null && this.neighborIterator != null) {
            while (this.neighborIterator.hasNext()) {
                int neighbor = this.neighborIterator.next();

                if (this.vertex <= neighbor) {
                    this.nextPair = new int[] {vertex, neighbor};
                    return;
                }
            }

            this.vertex++;
            if (this.vertex < getNumVertices())
                this.neighborIterator = neighborLists.get(this.vertex).keySet().iterator();
            else
                this.neighborIterator = null;
        }   
    }
}

// ------------------------------ GETTERS & SETTERS ------------------------------

public int getNumEdges() {
    return this.numEdges;
}


public int getNumVertices() {
    return this.neighborLists.size();
}


public Double getEdgeWeight(int vertexA, int vertexB) {
    return this.neighborLists.get(vertexA).get(vertexB);
}


public Set<Integer> getNeighbors(int vertex) {
    return Collections.unmodifiableSet(this.neighborLists.get(vertex).keySet());
}


public Iterator<int[]> getEdgeIterator() {
    return new EdgeIterator();
}

}

Zarjio
  • 1,097
  • 2
  • 12
  • 22
  • 1
    HashSets just return a variable. so its O(1) – FDinoff May 22 '13 at 03:09
  • Something is fishy. The code above, as is, will not run 10% slower with 2 size() calls regardless of the Set implementation you use in `calculateIntersection()`. There must be something else you aren't showing. – Sotirios Delimanolis May 22 '13 at 03:25
  • I've posted all the (relevant) code. It's long and nasty, and I don't think it will clarify anything. – Zarjio May 22 '13 at 03:30
  • When people on SO ask for code to reproduce the problem, they generally (always?) want a [SSCCE](http://sscce.org/) which has the problem, the whole problem and nothing but the problem – Craig May 22 '13 at 03:33
  • Sorry, I attempted to provide a simple example in the original post. The code as a whole is fairly complex and I can't see any way of simplifying it. As mentioned in the question, it all boils down to the fact that comparing the size of two Sets seems to make the program run substantially slower, even though this is supposed to be an optimization which will reduce the number of recursive calls. – Zarjio May 22 '13 at 03:37
  • 1
    Your optimization returns before adding nextCandidate to notCandidates and removing it from the graph. Since nothing else modifies notCandidates or the graph, the previous caller in the recursion has a higher chance of finding that notCandidates.set.isEmpty(), which causes an object allocation and an addAll() -- which is certainly _not_ an O(1) operation. So is this enough to account for 10% more time? I don't know what the baseline is (e.g. 100 ms --> 110 ms or 60 sec --> 66 sec?). – William Price May 22 '13 at 03:51
  • @WilliamPrice: Thank you for that. You've helped me find a bug in the code. I'm still examining it, but it is probably the cause of the performance drop. I should know better than to jump on StackOverflow the first time I have a problem, gah! – Zarjio May 22 '13 at 03:59
  • @Zarjio: Glad to help; also note that I wouldn't have been able to point that out if all I had to go on was your original "simplified" version. Having all of the relevant code definitely helps! – William Price May 22 '13 at 04:02

2 Answers2

7

It depends on the implementation; for example HashSet.size() simply calls size() on its internal hashMap which returns a variable;

//HashSet
public int size() {
    return map.size();
}

//Hashmap
public int size() {
    return size;
}
Craig
  • 1,390
  • 7
  • 12
  • 1
    I'll accept your answer Craig, since it is technically correct, although I realize now this was a terrible question and hopefully it will be deleted :/ – Zarjio May 22 '13 at 04:11
4

It depends on the implementation. For example, consider SortedSet.subSet. That returns a SortedSet<E> (which is therefore a Set<E>) but I certainly wouldn't expect the size() operation on that subset to be O(1).

You haven't said what kind of set you're using, nor exactly what the calculateIntersection methods do - but if they're returning views onto existing sets, I wouldn't be at all surprised to hear that finding the size of that view is expensive.

You've talked about HashSet, TreeSet, and LinkedHashSet, all of which are O(1) for size()... but if the calculateIntersection method originally takes one of those sets and creates a view from it, that could well explain what's going on. It would help if you'd give more details - ideally a short but complete program we could use to reproduce the problem.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • @SotiriosDelimanolis: Yup, I'd missed that - but I *suspect* there's more to it than that. Will edit. – Jon Skeet May 22 '13 at 03:13
  • Jon - I've updated the question with the implementation of calculateIntersection() - it just returns another Set of the same type as the two sets that went into it. – Zarjio May 22 '13 at 03:19
  • @Zarjio: That code is invalid. `Set` is an interface. There's no point in providing code which won't compile... we'd like a complete example which we can use to reproduce the problem. – Jon Skeet May 22 '13 at 03:21
  • Yes, I have tried TreeSet, HashSet, and LinkedHashSet for that constructor, and all suffer from the same performance drop. – Zarjio May 22 '13 at 03:22
  • @Zarjio: Well it still would be helpful to see this in a short but complete program - including how you're benchmarking it. – Jon Skeet May 22 '13 at 03:31
  • I've posted all the (relevant) code now. The full program would take forever to read - suffice it to say I have generated 5 graphs, and timed the method findMaximalCliques() on each one, testing it with all three types of Sets mentioned, with and without the lines commented out. When the lines in question are uncommented, the code runs (on average) roughly 10% slower. – Zarjio May 22 '13 at 03:34
  • 1
    @Zarjio: You don't have to post the full program - just a complete program which demonstrates the problem. We still haven't got a `main` method... there's nothing we can actually *run*. Have you done any diagnostics to find out whether the condition is ever actually satisfied, by the way? – Jon Skeet May 22 '13 at 03:38
  • Just tested that, yes, the condition is satisfied (lots). I'll try to work up something simpler. – Zarjio May 22 '13 at 03:40