2

I am trying to create a disjoint set structure in Rust. It looks like this

struct DisjointSet<'s> {
    id: usize,
    parent: &'s mut DisjointSet<'s>,
}

The default disjoint set is a singleton structure, in which the parent refers to itself. Hence, I would like to have the option to do the following:

let a: DisjointSet = DisjointSet {
    id: id,
    parent: self,
};

where the self is a reference to the object that will be created.

I have tried working around this issue by creating a custom constructor. However, my attempts failed because partial initialization is not allowed. The compiler suggests using Option<DisjointSet<'s>>, but this is quite ugly. Do you have any suggestions?

My question differs from Structure containing fields that know each other because I am interested in getting the reference to the struct that will be created.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Ben Ruijl
  • 4,973
  • 3
  • 31
  • 44
  • Have you tried making a `new` function that constructs an instance and then patches it up before returning? – Sage Mitchell Apr 01 '15 at 21:28
  • possible duplicate of [Structure containing fields that know each other](http://stackoverflow.com/questions/28113504/structure-containing-fields-that-know-each-other) – Shepmaster Apr 01 '15 at 21:52
  • 1
    `&mut` won't work anyway for parent references anyway, since it does not allow aliasing. `&` allows that but still requires some gymnastics (and a `Cell` if you want to change `id` after construction). It also puts weird requirements on the collection of all `DisjointSet`s (can't have `Vec>` for example). A disjoint set data structure is effectively a DAG, i.e. a graph, and graphs are hairy to encode in Rust. I'm inclined to recommend the traditional implementation via indices (`parent: usize`, implicitly referring to a `Vec` somewhere else). –  Apr 01 '15 at 22:18

1 Answers1

5

As @delnan says, at their core, these sort of data structures are directed acyclic graphs (DAGs), with all the sharing that entails. Rust is strict about what sharing can happen so it takes a bit of extra effort to convince the compiler to accept your code in this case.

Fortunately though, "all the sharing that entails" isn't literally "all the sharing": a DAG is acyclic (modulo wanting to have parent: self), so a reference counting type like Rc or Arc is a perfect way to handle the sharing (reference counting is not so good if there are cycles). Specifically:

struct DisjointSet {
    id: Cell<usize>,
    parent: Rc<DisjointSet>,
}

The Cell has zero runtime overhead (there is definitely some syntactic overhead) for such a small type.

Unfortunately, this still isn't quite right for the same reason that the compiler suggests using Option<...>. There's no way to create the first DisjointSet. However, the suggested fix still works:

struct DisjointSet {
    id: Cell<usize>,
    parent: Option<Rc<DisjointSet>>,
}

(The Option<...> is free: Option<Rc<...>> is a single nullable pointer, just like Rc<...> is a single non-nullable pointer, and presumably one would need a branch on "do I have a parent or not" anyway.)

If you are going to take this approach, I would recommend not trying to use the Option for partial initialisation, but instead use it to represent the fact that the given set is a "root". It is easy to traverse up a chain with this representation, e.g.

fn find_root(mut x: &DisjointSet) -> &DisjointSet {
    while let Some(ref parent) = x.parent {
        x = parent
    }
    x
}

The same approach should work fine with references, but the lifetimes can often be hard to juggle.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
huon
  • 94,605
  • 21
  • 231
  • 225