Dealing with Union-Find algorithms with a lot of objects

Question

I have a problem with (not anymore with stackoverflow (hehe)) Find algorithm when trying to implement UnionFind structure algorithm with path-compression.

I have standard array of ints, array can get pretty big -> it works fine until 60.000.000 elements.

My Union function looks like this:

public void unite(int p, int q) {
    if(p >= 0 && p < id.length && q >= 0 && q < id.length){
        if (isInSameSet(p, q)) return;
        id[find(p)] = find(q); 
        stevilo--;
    }
}

My isInSameSet looks like this:

    public boolean isInSameSet(int p, int q) {
        if(p >= 0 && p < id.length && q >= 0 && q < id.length) 
            return find(p) == find(q);
        return false;
}

I have tried iterative way in Find:

    public int find(int i) {
        while (i != id[i]){
            id[i] = id[id[i]];
            i = id[i];              
        }       
    return i;
    }

and tail-recrusion:

    public int find(int i) {    
        int p = id[i];
        if (i == p) {
          return i;
        }
        return id[i] = find(p);     
     }

Is there anything I missed in my code? Is there any other approach to this kind of problems?

@edit: Adding constructor to code:

    public UnionFind(int N) {
        stevilo = N;
        id = new int[N];        
        for(int i = 0; i < N; i++){
            id[i] = i;
        }

@edit2 (better explanation and new findings): The problem is not in stackoverflow anymore for less then 60.000.000 elements, which is more then enough for solving my problems.

I'm calling test Unions like this:

for(i=0;i<id.length-1;i++)
unite(i,i+1)

so the ending pairs are like this:

0:1, 1:2, 2:3, 3:4,..

which is only example of least optimal option for testing means only :)

Then I check if representative of 0 is last element in table (99 for 100 elements) and it works.

Problem is, that my algorithm works only if initial elements are each in their own union (0:0, 1:1, 2:2, 3:3). If I have different Unions already set up (0:2, 1:6, 2:1, 3:5, ...) my testing algorithm stops working.

I have narrow it down to a problem in Find function, probably something to do with path compression

id[i] = id[id[i]].

It would be better to post a complete compilable example along with the error stacktrace. I suspect an infinite loop/recursion in your code. — cheseaux, Mar 29 '14 at 09:46
Your iterative `find` is not correct. It doesn't do path compression. If you had `1->2->3->4->5` you end up with `1->3,2->4,3->5` — Thomas Ahle, Sep 20 '15 at 02:24
Altough its over year and a half old question, you are correct, its it wrong, thats why I asked peeps here for help.. — SubjectX, Sep 20 '15 at 11:59

score 2 · Answer 1 · answered Mar 29 '14 at 09:49

2

One small optimization would be to get rid of isInSameSet...

public void unite(int p, int q) {
    if(p >= 0 && p < id.length && q >= 0 && q < id.length){
        int rootp = find(p);
        int rootq = find(q);
        if (rootp==rootq) return;
        id[rootp] = rootq; 
        stevilo--;
    }
}

answered Mar 29 '14 at 09:49

Ashalynd

12,363
2
34
37

True, I gained 0.2seconds before getting stackoverflow with 100000000 elements. But need to look further.. – SubjectX Mar 29 '14 at 09:58
If you post more of your code, you will get more help on optimizing it :) – Ashalynd Mar 29 '14 at 16:55
Added constructor (which makes this complete code that I have..) to first post, not sure if it will help. I'm testing this thing on making Unions in sequence, making one long tree, then trying to find last item in this "list".. – SubjectX Mar 29 '14 at 19:20
Actually no, I'm calling it like this: for(i=0;i – SubjectX Mar 31 '14 at 08:20
but then you end up with all nodes sitting in the same cluster? – Ashalynd Mar 31 '14 at 08:47

score 2 · Answer 2 · answered Mar 31 '14 at 15:35

2

Union-Find data structures typically include TWO different optimizations. One is path compression. You have that.

But the other optimization happens during a Union, where you carefully choose which of the two roots to make a child of the other, usually via Union-By-Rank or Union-By-Size. With that optimization, your trees should never be deep enough to get a stack overflow. However, that optimization seems to be missing from your unite function.

answered Mar 31 '14 at 15:35

Chris Okasaki

4,875
1
20
19

Adding weighting usually requires another array. With numbers this big, that array halves the amount of heap that I have available.. – SubjectX Apr 01 '14 at 06:55
Sure, although with Union-by-Rank, the ranks will never get very big. A byte is plenty big enough. An extra array of bytes would only increase your memory footprint by 25%, not double it. – Chris Okasaki Apr 01 '14 at 10:33

score 1 · Accepted Answer · answered Mar 31 '14 at 15:45

I once wrote an algorithm for UnionFind, and its time complexity is O(log*(n)). Thats iterative logarithm of n. The algorithm compresses the path of the tree as it keeps on connecting the nodes to gain efficiency. I find it very efficient, though I haven't practically tested it against huge array size. Here's the code:

public class UnionFind
{
  private int[] id;

  public UnionFind(int capacity)
  {
    id = new int[capacity];
    for (int i = 0; i < capacity; i++)
    {
      id[i] = i;
    }
  }

  public boolean isConnected(int p, int q)
  {
    return root(p) == root(q);
  }

  public void connect(int p, int q)
  {
    if (isConnected(p, q))
    {
      return;
    }

    id[root(p)] = root(q);
  }

  private int root(int p)
  {
    int temp = p;

    if (p != id[p] && id[id[p]] != id[p])
    {
      while (p != id[p])
      {
        p = id[p];
      }

      id[temp] = id[p];
    }

    return id[p];
  }
}

Dealing with Union-Find algorithms with a lot of objects

3 Answers3