0

I have the following code:

    Set<Set<Integer>> groups = new HashSet<>();

    for (int[] edge : edges) {
        Set<Integer> match1 = null;
        Set<Integer> match2 = null;

        for(Set<Integer> group : groups) {
            if(group.contains(edge[0])) match1 = group;
            if(group.contains(edge[1])) match2 = group;
        }

        if(match1 != null && match1 == match2) {
            result = edge;
        }else if(match1 != null && match2 != null && match1 != match2) {
            match1.addAll(match2);
            groups.remove(match2);     <---- This does not remove match2 from set
        }else{
            Set<Integer> newGroup = new HashSet<>();
            newGroup.add(edge[0]);
            newGroup.add(edge[1]);
            groups.add(newGroup);
        }
        
        .........
    }

The groups is a Set of Set of Integers.

However, the groups.remove(match2) method does not remove the set of integers from the groups.

What happened here?

ZZZ
  • 645
  • 4
  • 17
  • 8
    Storing mutable things (like other sets) in a hash set is usually a bad idea... – Sweeper Jun 25 '21 at 10:00
  • The brute way is to write an Implemantation of Set with a hash function returning a static value and the usual eqauls function. This would be slow but should work, since all sets are placed in the same bucket of the super set. – Turo Jun 25 '21 at 10:41

1 Answers1

5

Quoting the Set interface documentation:

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.

You are violating this rule when you mutate match1 with the addAll() call while it's still inside the outer Set. The mutation causes the hash code to change without the outer Set knowing about it, so the inner Set is now in the wrong hash bucket and can't be found when you try to remove it in some subsequent iteration.

As a workaround, you can remove it, then call addAll(), then add it again. Internally, this will cause it to be placed in the correct hash bucket for its new hash code.


By the way, it looks like you're implementing a connected component detection algorithm. You can do that more efficiently using a disjoint set forest data structure. There's some good pseudocode in How to store each set in a disjoint set forest.

Thomas
  • 174,939
  • 50
  • 355
  • 478