3

I am looking for an existing implementation of a union-find or disjoint set data structure in Scala before I attempt to roll my own as the optimisations look somewhat complicated.

I mean this kind of thing - where the two operations union and find are optimised.

Does anybody know of anything existing? I've obviously tried googling around.

selig
  • 4,834
  • 1
  • 20
  • 37
  • 1
    Somewhat complicated? The union by rank optimization just adds some simple and intuitive conditions, and the path compression adds a single assignment (the pseudo code on Wikipedia obscures that, try rewriting it - but it's not like Wikipedia's version is any harder to comprehend). Implementing that is trivial (in stark contrast to coming up with it and analysing its asymptotic complexity). –  Jul 01 '13 at 17:06
  • 1
    See also: CR - [Scala: Disjoint-Sets](http://codereview.stackexchange.com/q/17621/15600). – Petr Jul 01 '13 at 17:43

2 Answers2

1

I had written one for myself some time back which I believe performs decently. Unlike other implementations, the find is O(1) and union is O(log(n)). If you have a lot more union operations than find, then this might not be very useful. I hope you find it useful:

package week2

import scala.collection.immutable.HashSet
import scala.collection.immutable.HashMap

/**
 * Union Find implementaion.
 * Find is O(1)
 * Union is O(log(n))
 * Implementation is using a HashTable. Each wrap has a set which maintains the elements in that wrap.
 * When 2 wraps are union, then both the set's are clubbed. O(log(n)) operation
 * A HashMap is also maintained to find the Wrap associated with each node. O(log(n)) operation in mainitaining it.
 * 
 * If the input array is null at any index, it is ignored
 */
class UnionFind[T](all: Array[T]) {
  private var dataStruc = new HashMap[T, Wrap]
  for (a <- all if (a != null))
    dataStruc = dataStruc + (a -> new Wrap(a))

  var timeU = 0L
  var timeF = 0L

  /**
   * The number of Unions
   */
  private var size = dataStruc.size

  /**
   * Unions the set containing a and b
   */
  def union(a: T, b: T): Wrap = {
    val st = System.currentTimeMillis()
    val first: Wrap = dataStruc.get(a).get
    val second: Wrap = dataStruc.get(b).get
    if (first.contains(b) || second.contains(a))
      first
    else {
      // below is to merge smaller with bigger rather than other way around
      val firstIsBig = (first.set.size > second.set.size)
      val ans = if (firstIsBig) {
        first.set = first.set ++ second.set
        second.set.foreach(a => {
          dataStruc = dataStruc - a
          dataStruc = dataStruc + (a -> first)
        })
        first
      } else {
        second.set = second.set ++ first.set
        first.set.foreach(a => {
          dataStruc = dataStruc - a
          dataStruc = dataStruc + (a -> second)
        })
        second
      }
      timeU = timeU + (System.currentTimeMillis() - st)
      size = size - 1
      ans
    }
  }

  /**
   * true if they are in same set. false if not
   */
  def find(a: T, b: T): Boolean = {
    val st = System.currentTimeMillis()
    val ans = dataStruc.get(a).get.contains(b)
    timeF = timeF + (System.currentTimeMillis() - st)
    ans
  }

  def sizeUnion: Int = size

  class Wrap(e: T) {
    var set = new HashSet[T]
    set = set + e

    def add(elem: T) {
      set = set + elem
    }

    def contains(elem: T): Boolean = set.contains(elem)
  }
}
Michael Mior
  • 28,107
  • 9
  • 89
  • 113
Jatin
  • 31,116
  • 15
  • 98
  • 163
  • This implementation looks like mutable implementation of UnionFind (`var` and reassignment `=`), buts it is using immutable datastructures (`immutable.HashMap` and `immutable.HashSet`), and thus I suspect it may not actually be O(1) and U(log(n)), but I have not yet thought about it completely... – drcicero Jan 11 '23 at 20:36
0

Here is a simple, short and somewhat efficient mutable implementation of UnionFind:

import scala.collection.mutable

class UnionFind[T]:
  private val map = new mutable.HashMap[T, mutable.HashSet[T]]
  private var size = 0
  def distinct = size

  def addFresh(a: T): Unit =
    assert(!map.contains(a))
    val set = new mutable.HashSet[T]
    set += a
    map(a) = set
    size += 1

  def setEqual(a: T, b: T): Unit =
    val ma = map(a)
    val mb = map(b)
    if !ma.contains(b) then
      // redirect the elements of the smaller set to the bigger set
      if ma.size > mb.size
      then
        ma ++= mb
        mb.foreach { x => map(x) = ma }
      else
        mb ++= ma
        ma.foreach { x => map(x) = mb }
      size = size - 1

  def isEqual(a: T, b: T): Boolean =
    map(a).contains(b)

Remarks:

  • An immutable implementation of UnionFind can be useful when rollback or backtracking or proofs are necessary
  • An mutable implementation can avoid garbage collection for speedup
  • One could also consider a persistent datastructure -- works like an immutable implementation, but is using internally some mutable state for speed
drcicero
  • 151
  • 2
  • 6