Can recursive union find be optimized?

Question

When implementing union-find, I would usually write the find function with path compression like this:

def find(x):
    if x != par[x]:
        par[x] = find(par[x])
    return par[x]

This is easy to remember and arguably easy to read. This is also how many books and websites describe the algorithm.

However, naively compiled, this would use stack memory linear in the input size. In many languages and systems that would by default result in a stack overflow.

The only non-recursive way I know of writing find is this:

def find(x):
    p = par[x]
    while p != par[p]:
        p = par[p]
    while x != p:
        x, par[x] = par[x], p
    return p

It seems unlikely that many compilers would find that. (Maybe Haskell would?)

My question is in what cases it is safe to use the former version of find? If no widely used language can remove the recursion, shouldn't we tell people to use the iterative version? And might there be a simpler iterative implementation?

score 2 · Accepted Answer · answered Jan 12 '16 at 01:47

There seem to be two separate questions here.

First - can optimizing compilers notice this and rewrite it? It's difficult to answer this question without testing all compilers and all versions. I tried this out using gcc 4.8.4 on the following code:

size_t find(size_t uf[], size_t index) {
  if (index != uf[index]) {
    uf[index] = find(uf, uf[index]);
  }
  return uf[index];
}

void link(size_t uf[], size_t i, size_t j) {
  uf[find(uf, i)] = uf[find(uf, j)];
}

This doesn't implement the union-by-rank optimization, but does support path compression. I compiled this using optimization level -O3 and the assembly is shown here:

 find:
.LFB23:
    .cfi_startproc
    pushq   %r14
    .cfi_def_cfa_offset 16
    .cfi_offset 14, -16
    pushq   %r13
    .cfi_def_cfa_offset 24
    .cfi_offset 13, -24
    pushq   %r12
    .cfi_def_cfa_offset 32
    .cfi_offset 12, -32
    pushq   %rbp
    .cfi_def_cfa_offset 40
    .cfi_offset 6, -40
    pushq   %rbx
    .cfi_def_cfa_offset 48
    .cfi_offset 3, -48
    leaq    (%rdi,%rsi,8), %rbx
    movq    (%rbx), %rax
    cmpq    %rsi, %rax
    je  .L2
    leaq    (%rdi,%rax,8), %rbp
    movq    0(%rbp), %rdx
    cmpq    %rdx, %rax
    je  .L3
    leaq    (%rdi,%rdx,8), %r12
    movq    %rdx, %rax
    movq    (%r12), %rcx
    cmpq    %rcx, %rdx
    je  .L4
    leaq    (%rdi,%rcx,8), %r13
    movq    %rcx, %rax
    movq    0(%r13), %rdx
    cmpq    %rdx, %rcx
    je  .L5
    leaq    (%rdi,%rdx,8), %r14
    movq    %rdx, %rax
    movq    (%r14), %rsi
    cmpq    %rsi, %rdx
    je  .L6
    call    find           // <--- Recursion!
    movq    %rax, (%r14)
.L6:
    movq    %rax, 0(%r13)
.L5:
    movq    %rax, (%r12)
.L4:
    movq    %rax, 0(%rbp)
.L3:
    movq    %rax, (%rbx)
.L2:
    popq    %rbx
    .cfi_def_cfa_offset 40
    popq    %rbp
    .cfi_def_cfa_offset 32
    popq    %r12
    .cfi_def_cfa_offset 24
    popq    %r13
    .cfi_def_cfa_offset 16
    popq    %r14
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc

Given the existence of the recursive call in the middle, it doesn't look like this tail call was eliminated. In fairness, that's because the transformation you're describing is pretty nontrivial, so I'm not surprised it didn't find it. This doesn't mean that no optimizing compiler can find it, but that one major one won't.

Your second question is why we present the algorithm this way. As someone who teaches both algorithms and programming, I think it's extremely valuable to discuss algorithms using a presentation that's as simple as possible, even if it means abstracting away some particular implementation details. Here, the key idea behind the algorithm is to update the parent pointers of all the nodes encountered up on the way to the representative. Recursion happens to be a pretty clean way of describing that idea, even though when implemented naively it risks a stack overflow. However, by expressing the pseudocode in that particular way, it's easier to describe and discuss it and to prove that it will work as advertised. We could describe it the other way to avoid a stack overflow, but in Theoryland we usually don't worry about details like that and the updated presentation, while more directly translatable into practice, would make it harder to see the key ideas.

When looking at pseudocode for more advanced algorithms and data structures, it's common to omit critically important implementation details and to handwave that certain tasks are possible to do in certain time bounds. When discussing algorithms or data structures that build on top of even more complex algorithms and data structures, it often becomes impossible to write out pseudocode for everything because you have layers on top of layers on top of layers of glossed-over details. From a theoretical perspective, this is fine - if the reader does want to implement it, they can fill in the blanks. On the other hand, if the reader is more interested in the key techniques from the paper and the theory (which in academic settings is common), they won't get bogged down in implementation details.

Thank you for the detailed answer. My second question was more on the algorithmic aspects of union find. It might be possible to show, that the recursion will never be very deep, and thus the first version is perfectly safe to implement. I don't know of any such proof though, so I guess for now, I'll try to remember coding the iterative version instead. — Thomas Ahle, Oct 17 '16 at 10:45
The recursion in this can can be linearly deep. Imagine always linking the head of one of the lists to a singleton list. This builds a chain of linear length. Now do a find on the deepest node. — templatetypedef, Oct 17 '16 at 16:12
Good point. With 'rank' of course this wouldn't happen, but without it we have to be careful. — Thomas Ahle, Oct 17 '16 at 17:18
Java may some day finally support Tail Call Optimisation but not yet: `https://stackoverflow.com/questions/53354898/tail-call-optimisation-in-java` — Vadim Kirilchuk, Jun 02 '23 at 03:18

Vadim Kirilchuk · Answer 2 · 2023-06-02T03:08:50.233

How about this?

while (x != root[x]) {
    root[x] = root[root[x]];
    x = root[x];
}
return x;

Actually, wikipedia has everything you need:

There are several algorithms for Find that achieve the asymptotically optimal time complexity. One family of algorithms, known as path compression, makes every node between the query node and the root point to the root. Path compression can be implemented using a simple recursion as follows:

function Find(x) is
    if x.parent ≠ x then
        x.parent := Find(x.parent)
        return x.parent
    else
        return x
    end if
end function

This implementation makes two passes, one up the tree and one back down. It requires enough scratch memory to store the path from the query node to the root (in the above pseudocode, the path is implicitly represented using the call stack). This can be decreased to a constant amount of memory by performing both passes in the same direction. The constant memory implementation walks from the query node to the root twice, once to find the root and once to update pointers:

function Find(x) is
    root := x
    while root.parent ≠ root do
        root := root.parent
    end while

    while x.parent ≠ root do
        parent := x.parent
        x.parent := root
        x := parent
    end while

    return root
end function

Tarjan and Van Leeuwen also developed one-pass Find algorithms that retain the same worst-case complexity but are more efficient in practice.[4] These are called path splitting and path halving. Both of these update the parent pointers of nodes on the path between the query node and the root. Path splitting replaces every parent pointer on that path by a pointer to the node's grandparent:

function Find(x) is
    while x.parent ≠ x do
        (x, x.parent) := (x.parent, x.parent.parent)
    end while
    return x
end function

Path halving works similarly but replaces only every other parent pointer:

function Find(x) is
    while x.parent ≠ x do
        x.parent := x.parent.parent
        x := x.parent
    end while
    return x
end function

Can recursive union find be optimized?

2 Answers2