I have a set of half a million items stored in the database and need the following operations:
union(x, y)
just like in Union-FindfindAll(x)
finding ally
such thatfind(x) == find(y)
ununion(x, y)
reverting a former union operation
This is a practical problem, for which the following is known
- The partitions will be typically small (less than 100 elements), but there's no guarantee.
- The speed of
union
operation doesn't matter much. findAll
has to be fast and needs to be implemented in SQL (without recursion / CONNECT BY).- Sometimes, we find out, that some
union
was actually wrong and need to undo it, while keeping all the previous and followingunion
s. This operation is rare enough, so the speed doesn't matter. - It's not necessary that
findAll
sees changes done by the other operations immediately. Some post-processing would be OK.
The classical Union-Find algorithm needs path compression (or a variant) for efficiency and allows no edge deletions (even without path compression). I'm aware of Dynamic connectivity, but it looks like non-applicable to my use case.
I guess, we can't use it, as the speed of findAll
is the most important. Probably, we should link all nodes to the root directly.
Concerning ununion
, my only idea is to store all union
operations separately, and on ununion
, remove all links from the corresponding partition and redo all related union
s.
This sounds rather brute-force like...
Before starting implementing anything, I'm asking if there's smarter algorithm?