Merge Sets of Sets that contain common elements in Scala

Question

I want to implement a function in Scala, that, given a Set of Sets of Ints will merge any containing Set that contains one or more common elements.

So for example, given:

def mergeSets(sets: Set[Set[Int]]): Set[Set[Int]] = ???

val sets = Set(Set(1,2), Set(2,3), Set(3,7), Set(8,10))  
val mergedSets = mergeSets(sets)

mergedSets will contain Set(Set(1,2,3,7), Set(8,10))

What would be a nice, efficient and functional if possible, way to do this in Scala?

samthebest · Accepted Answer · 2014-09-02T12:18:49.727

7

The most efficient way to do this will be using mutable structures, but you asked for a functional way, so here goes:

sets.foldLeft(Set.empty[Set[Int]])((cum, cur) => {
  val (hasCommon, rest) = cum.partition(_ & cur nonEmpty)
  rest + (cur ++ hasCommon.flatten)
})

(Not tested, wrote this using phone)

edited Sep 02 '14 at 12:18

answered Sep 02 '14 at 12:05

samthebest

30,803
25
102
142

Doesn't compile (you can't have _ in parens in the filter). I edited it to fix that and it works on the one test case I tried :) – The Archetypal Paul Sep 02 '14 at 12:10
@Paul Thx, and there was a bug actually which i just fixed. Also i think -- is deprecated, so changed it to filterNot – samthebest Sep 02 '14 at 12:13
1

Can you use `partition` to split `cum` into the sets with intersections and the rest? Might be a bit clearer? – The Archetypal Paul Sep 02 '14 at 12:13
@Paul ... genius! Will do – samthebest Sep 02 '14 at 12:14
1

+1 Short and elegant, and with some chance of a casual reader working out what it actually does! Not sure the latest edit has made it clearer, though, as the precedence isn't immediately obvious. – The Archetypal Paul Sep 02 '14 at 12:19
@Paul precedence is clear; it has to be the expected way or one wod get a type mismatch. This isnt Python or JS!! ;) Static typing FTW – samthebest Sep 02 '14 at 12:24
Right, but that doesn't make it obvious, just unambiguous. I still think it's clearer with parens. But this is just preferred style, so a matter of opinion only. – The Archetypal Paul Sep 02 '14 at 12:29
Yea, this is nicer than my approach. – Kigyo Sep 02 '14 at 12:44
+1, I would have also used mutable variables here, the pure functional way is pretty ugly. – Ende Neu Sep 02 '14 at 16:25
I think this pure functional way is pretty. Not pretty ugly. – The Archetypal Paul Sep 02 '14 at 19:09

score 1 · Answer 2 · answered Sep 05 '14 at 06:38

A version that's largely in the spirit of samthebest's answer, but (by design) less deeply idiomatic. It may be more approachable for those new to functional programming. (It seems we should squeeze everything we can out of such a nice problem.)

def mergeSets(sets: Set[Set[Int]]): Set[Set[Int]] = {
  if (sets.isEmpty) {
    Set.empty[Set[Int]]
  } else {
    val cur = sets.head
    val merged = mergeSets(sets.tail)
    val (hasCommon, rest) = merged.partition(_ & cur nonEmpty)
    rest + (cur ++ hasCommon.flatten)
  }
}

However, the following alternative has the advantage of being tail recursive and perhaps also providing a smoother path to understanding samthebest's answer:

def mergeSets(cum: Set[Set[Int]], sets: Set[Set[Int]]): Set[Set[Int]] = {
  if (sets.isEmpty) {
    cum
  } else {
    val cur = sets.head
    val (hasCommon, rest) = cum.partition(_ & cur nonEmpty)
    mergeSets(rest + (cur ++ hasCommon.flatten), sets.tail)
  }
}

def mergeSets(sets: Set[Set[Int]]): Set[Set[Int]] = 
  mergeSets(Set.empty[Set[Int]], sets)

I don't claim either of these as superior: just useful as learning tools.

Eddie Carlson · Answer 3 · 2015-04-11T17:52:07.643

Samthebest's terse solution is very satisfying in it's simplicity and elegance, but I am working with a a large number of sets and needed a more performant solution that is still immutable and written in good functional style.

For 10,000 sets with 10 elements each (randomly chosen ints from 0 to 750,000), samthebest's terse solution took an average of ~ 30sec on my computer, while my solution below took on average ~ 400ms.

(In case anyone was wondering, the resultant set for the above set cardinalities contains ~ 3600 sets, with an average of ~ 26 elements each)

If anyone can see any improvements I could make with respect to style or performance, please let me know!

Here's what I came up with:

val sets = Set(Set(1, 2), Set(2, 3), Set(4, 5))
Association.associate(sets) => Set(Set(1, 2, 3), Set(4, 5))


object Association {

  // Keep track of all current associations, as well as every element in any current association
  case class AssociationAcc[A](associations: Set[Set[A]] = Set.empty[Set[A]], all: Set[A] = Set.empty[A]) {
    def +(s: Set[A]) = AssociationAcc(associations + s, all | s)
  }

  // Add the newSet to the set associated with key A
  // (or simply insert if there is no such key).
  def updateMap[A](map: Map[A, Set[A]], key: A, newSet: Set[A]) = {
    map + (key -> (map.getOrElse(key, Set.empty) ++ newSet))
  }

  // Turn a Set[Set[A]] into a map where each A points to a set of every other A
  // it shared any set with.
  //
  // e.g. sets = Set(Set(1, 2), Set(2, 3), Set(4, 5))
  //     yields: Map(1 -> Set(2), 2 -> Set(1, 3), 3 -> Set(2),
  //                 4 -> Set(5), 5 -> Set(4))
  def createAssociationMap[A](sets: Set[Set[A]]): Map[A, Set[A]] = {
    sets.foldLeft(Map.empty[A, Set[A]]) { case (associations, as) =>
      as.foldLeft(associations) { case (assoc, a) => updateMap(assoc, a, as - a) }
    }
  }

  // Given a map where each A points to a set of every A it is associated with,
  // and also given a key A starting point, return the total set of associated As.
  //
  // e.g. with map = Map(1 -> Set(2), 2 -> Set(1, 3), 3 -> Set(2),
  //                     4 -> Set(5), 5 -> Set(4))
  // and key = 1 (or 2 or 3) yields: Set(1, 2, 3).
  // with key = 4 (or 5) yields: Set(4, 5)
  def getAssociations[A](map: Map[A, Set[A]], key: A, hit: Set[A] = Set.empty[A]): Set[A] = {
    val newAssociations = map(key) &~ hit
    newAssociations.foldLeft(newAssociations | hit + key) {
      case (all, a) => getAssociations(map, a, all)
    }
  }

  // Given a set of sets that may contain common elements, associate all sets that
  // contain common elements (i.e. take union) and return the set of associated sets.
  //
  // e.g. Set(Set(1, 2), Set(2, 3), Set(4, 5)) yields: Set(Set(1, 2, 3), Set(4, 5))
  def associate[A](sets: Set[Set[A]]): Set[Set[A]] = {
    val associationMap = createAssociationMap(sets)
    associationMap.keySet.foldLeft(AssociationAcc[A]()) {
      case (acc, key) =>
        if (acc.all.contains(key)) acc
        else                       acc + getAssociations(associationMap, key)
    }.associations
  }
}

score 0 · Answer 4 · answered Sep 02 '14 at 13:08

0

This is probably just a variant of samthebest's answer, but for the sake of variety:

  def mergeSets(sets: Set[Set[Int]]): Set[Set[Int]] = {
    def hasIntersect(set: Set[Int]): Boolean = 
      sets.count(set.intersect(_).nonEmpty) > 1

    val (merged, rejected) = sets partition hasIntersect
    Set(merged.flatten, rejected.flatten)
  }

answered Sep 02 '14 at 13:08

rompetroll

4,781
2
37
50

Given `Set(Set(1, 2), Set(2, 3), Set(4, 5), Set(5, 6))` your function will give `Set(Set(1, 2, 3, 4, 5, 6))` but the desired result is `Set(Set(1, 2, 3), Set(4, 5, 6))`. Right? Similarly all disjoint sets gets merged into one set. – samthebest Sep 02 '14 at 13:36
Ah, I see that we understand the problem differently then. The way I read it, the output should be Set(allMembersFromSetsWithDuplicates, membersOfSetsWithoutDuplicates). Whereas you read it as grouping of connected sets. – rompetroll Sep 02 '14 at 13:45
It would be good if the OP can give a more complex example (e.g. the output of `Set(Set(1,2), Set(2,3), Set(3,7), Set(8,10), Set(5,6), Set(6,9)) ` but I interpret it @samthebest's way – The Archetypal Paul Sep 02 '14 at 13:56
I could see why one may interpret that allMembersFromSetsWithDuplicates should be one set, but not the sets without. Why merge the sets that do not have stuff in common? Im suggesting you update your last line to be `rejected + merged.flatten`. I also suggest using `sets.exists(set.intersect(_).nonEmpty)`, it's more efficent and a tad more readible. – samthebest Sep 02 '14 at 14:05

spyk · Answer 5 · 2018-06-09T08:28:33.477

This problem can also be easily modelled in a disjoint-set (or union-find) data structure https://en.wikipedia.org/wiki/Disjoint-set_data_structure.

This will provide logarithmic time performance in most cases. I have uploaded a gist that works as a modified UnionFind algorithm and provides a mergeSets method to return the merged sets. This can be further optimized with path compression to allow almost constant time performance: https://gist.github.com/spyk/fa7ad42baa7abbf50337409c24c44303

Merge Sets of Sets that contain common elements in Scala

5 Answers5

Linked