I am not sure whether "parent" in the highlighted part refers to a "defaultdict" object that the user needs to input
No. The defaultdict
is the graph that the user inputs. parent
is the tree that represents the sets of connected components of the graph -- it is any tree that connects all nodes from the same connected components. There are multiple ways to construct such a tree, some are better than others
For example, you have a graph consisting of 2 components:
1--2
| |
3--4 5---6
The tree (more accurately, the forest) can be
1 5 parent[2] = 1, parent[3] = 1, parent[4] = 3, parent[6] = 5
/ \ \ parent[1] = parent[5] = -1
2 3 6
\
4
it can also be
1 5 parent[2] = 1, parent[3] = 2, parent[4] = 3, parent[6] = 5
\ \
2 6 parent[1] = parent[5] = -1
\
3
\
4
The first version is better because the tree is shorter in height. Actually the best tree to represent the component in the above graph is:
1 5
/ | \ |
2 3 4 6
Formally speaking:
parent[u] == -1
means u is the root of the tree.
parent[u] == v
means u and v belongs to the same connected components.However note that, the reverse is not true: it can be parent[u] != v
but u and v belongs to the same connected components. Why? Because it can be that parent[u] == w, parent[w] == v
and u and v are connected by association.
- So how to determine if two nodes u and v are connected? You need to find the roots of the trees that contain u and v, and if the two roots are same then u and v belongs to the same tree and they are connected.
- How to find the root? By going up the parent node until reaching the root (which have a parent of -1)
There are also some tricks to shorten the height of the trees while maintaining its representation. Normally, the height of the union find tree should be and can be kept very short, in the order of log*(N) (with log* is the iterated log)