2

This is the follow up of compare sets

I have

Set<Set<Node>> NestedSet = new HashSet<Set<Node>>();

[[Node[0], Node[1], Node[2]], [Node[0], Node[2], Node[6]], [Node[3], Node[4], Node[5]]  [Node[2], Node[6], Node[7]] ]

I want to merge the sets when there are two elements in common. For example 0,1,2 and 0,2,6 has two elements in common so merging them to form [0,1,2,6].

Again [0,1,2,6] and [2,6,7] has 2 and 6 common. so merging them and getting [0,1,2,6,7].

The final output should be :

[  [Node[0], Node[1], Node[2], Node[6], Node[7]], [Node[3], Node[4], Node[5]] ]

I tried like this :

         for (Set<Node> s1 : NestedSet ) {
                        Optional<Set<Node>> findFirst = result.stream().filter(p -> { HashSet<Node> temp = new HashSet<>(s1); 
                        temp.retainAll(p); 
                        return temp.size() == 2; }).findFirst(); 

                        if (findFirst.isPresent()){

                            findFirst.get().addAll(s1); 
                        }
                        else {
                            result.add(s1);
                        }    

                    }

But the result I got was :

[[Node[0], Node[1], Node[2], Node[6], Node[7]], [Node[3], Node[4], Node[5]], [Node[0], Node[2], Node[6], Node[7]]]

Any idea ? Is there any way to get the desired output?

Community
  • 1
  • 1
priya
  • 375
  • 5
  • 22

3 Answers3

2

Some considerations:

  • Each time you apply a merge, you have to restart the procedure and iterate over the modified collection. Because of this, the iteration order of the input set is important, if you want your code to be deterministic you may want to use collections that give guarantees over their iteration order (e.g. use LinkedHashSet (not HashSet) or List.
  • Your current code has side effects as it modifies the supplied sets when merging. In general I think it helps to abstain from creating side effects whenever possible.

The following code does what you want:

static <T> List<Set<T>> mergeSets(Collection<? extends Set<T>> unmergedSets) {
  final List<Set<T>> mergedSets = new ArrayList<>(unmergedSets);

  List<Integer> mergeCandidate = Collections.emptyList();
  do {
    mergeCandidate = findMergeCandidate(mergedSets);

    // apply the merge
    if (!mergeCandidate.isEmpty()) {
      // gather the sets to merge
      final Set<T> mergedSet = Sets.union(
          mergedSets.get(mergeCandidate.get(0)),
          mergedSets.get(mergeCandidate.get(1)));

      // removes both sets using their index, starts with the highest index
      mergedSets.remove(mergeCandidate.get(0).intValue());
      mergedSets.remove(mergeCandidate.get(1).intValue());

      // add the mergedSet
      mergedSets.add(mergedSet);
    }
  } while (!mergeCandidate.isEmpty());

  return mergedSets;
}

// O(n^2/2)
static <T> List<Integer> findMergeCandidate(List<Set<T>> sets) {
  for (int i = 0; i < sets.size(); i++) {
    for (int j = i + 1; j < sets.size(); j++) {
      if (Sets.intersection(sets.get(i), sets.get(j)).size() == 2) {
        return Arrays.asList(j, i);
      }
    }
  }
  return Collections.emptyList();
}

For testing this method I created two helper methods:

static Set<Integer> set(int... ints) {
  return new LinkedHashSet<>(Ints.asList(ints));
}

@SafeVarargs
static <T> Set<Set<T>> sets(Set<T>... sets) {
  return new LinkedHashSet<>(Arrays.asList(sets));
}

These helper methods allow to write very readable tests, for example (using the numbers from the question):

public static void main(String[] args) {
    // prints [[2, 6, 7, 0, 1]]
    System.out.println(mergeSets(sets(set(0, 1, 2, 6), set(2, 6, 7))));
    // prints [[3, 4, 5], [0, 2, 6, 1, 7]]
    System.out.println(
      mergeSets(sets(set(0, 1, 2), set(0, 2, 6), set(3, 4, 5), set(2, 6, 7))));
}
rinde
  • 1,181
  • 1
  • 8
  • 20
  • Thank you very much for your help. I am not sure why is this .findFirst () is being called here ? After filtering, mergeCandidate holds two common elements ? – priya Jan 24 '16 at 23:36
  • The `Sets.cartesianProduct(..)` returns a set of pairs (list of size 2), for each pair we have to check if we can merge them. However, it's possible that there are multiple pairs matching your merge criterion, from your question it seems you want to merge only one pair at a time, therefore we select only one pair per iteration. – rinde Jan 25 '16 at 07:23
  • Thanks rinde again! now it's clear. But how about its complexity? is it possible to apply this to millions of sets? How far these stream () operations efficient? – priya Jan 25 '16 at 08:46
  • When reading your question I realized that using the Cartesian product was actually a bad idea as it does n^2 iterations instead of the only needed n^2/2. I updated the code (this time without using streams). – rinde Jan 25 '16 at 10:35
1

I'm not sure why you are getting that result, but I do see another problem with this code: It is order-dependent. For example, even if the code worked as intended, it would matter whether [Node[0], Node[1], Node[2]] is compared first to [Node[0], Node[2], Node[6]] or [Node[2], Node[6], Node[7]]. But Sets don't have a defined order, so the result is either non-deterministic or implementation-dependent, depending on how you look at it.

If you really want deterministic order-dependent operations here, you should be using List<Set<Node>>, rather than Set<Set<Node>>.

Ken Clubok
  • 1,238
  • 6
  • 11
1

Here's a clean approach using recursion:

public static <T> Set<Set<T>> mergeIntersectingSets(Collection<? extends Set<T>> unmergedSets) {
    boolean edited = false;
    
    Set<Set<T>> mergedSets = new HashSet<>();
    for (Set<T> subset1 : unmergedSets) {
        boolean merged = false;
        
        // if at least one element is contained in another subset, then merge the subsets
        for (Set<T> subset2 : mergedSets) {
            if (!Collections.disjoint(subset1, subset2)) {
                subset2.addAll(subset1);
                merged = true;
                edited = true;
            }
        }
        // otherwise, add the current subset as a new subset
        if (!merged) mergedSets.add(subset1);
    }
    
    if (edited) return mergeIntersectingSets(mergedSets); // continue merging until we reach a fixpoint
    else return mergedSets;
}
logi-kal
  • 7,107
  • 6
  • 31
  • 43