1

What I have: an equivalence relation on the set {1,...,n}, given as the full list of equivalent pairs (a,b) (this is already transitive; no need to compute a transitive closure). n is large (say, a few millions at most); the total number of equivalent pairs is at most O(n) (and usually much less), and individual equivalence classes are small (I don't have a figure for this, but say O(log n)).

What I want: a way to individually enumerate all equivalence classes, i.e. successively get the full set of elements of each equivalence class; with complexity at most O~(n) (all other computations performed in my code are O(n log(n)) and I definitely don't want to do anything quadratic on that size if possible).

Using an union-find structure would help in constructing the transitive closure (which I don't need) but, I believe, not in constructing the equivalence classes. I can also easily compute (say) the smallest representative in each class (just scan the full equivalence relation until I find it) and even store all of them in time O(n) (build an identity vector of 1...n then scan the full equivalence relation, replacing by the smallest value if possible); however, this also does not help much in enumerating the equivalence classes.

Does there exist a classical solution to this problem?

Circonflexe
  • 351
  • 1
  • 9
  • OK, I believe I have something: first construct (as above) the vector `v` of smallest representatives, then build the sparse matrix with entries `(v[i], i)` (this is still `O(n)`); the rows of this matrix give the answer. – Circonflexe Mar 26 '21 at 15:16

1 Answers1

0

You can think of this as a graph search problem. You’re given a list of all the edges that make up a graph, and you want to find its connected components. If you had an adjacency list for the graph, this could be done with a DFS or BFS in time O(n), since there are n nodes and O(n) total edges.

Fortunately, you can construct an adjacency list in time O(n). Make an array of n lists in time O(n). Then, iterate across the edges and for each edge (a, b), append b to list number a.

Overall, this lets you find all the equivalence classes (connected components) in time O(n). This works even if the relation isn’t transitive.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065