You can use a data structure making it more efficient to perform a merge. Here you create some sort of opposite tree. So in your example you first would create the numbers listed:
1 2 3 4 5 8 10
Now if you iterate over the (1,2)
tuple, you look up 1
and 2
in some sort of dictionary. You search their ancestors (there are none here) and then you create some sort of merge node:
1 2 3 4 5 8 10
\/
12
Next we merge (1,3)
so we look up the ancestor of 1
(12
) and 3
(3
) and perform another merge:
1 2 3 4 5 8 10
\/ |
12 /
\/
123
Next we merge (2,4)
and (5,8)
and (8,10)
:
1 2 3 4 5 8 10
\/ | | \/ |
12 / | 58 /
\/ / \/
123 / 5810
\/
1234
You also keep a list of the "merge-heads" so you can easily return the elements.
Time to get our hands dirty
So now that we know how to construct such a datastructure, let's implement one. First we define a node:
class Merge:
def __init__(self,value=None,parent=None,subs=()):
self.value = value
self.parent = parent
self.subs = subs
def get_ancestor(self):
cur = self
while cur.parent is not None:
cur = cur.parent
return cur
def __iter__(self):
if self.value is not None:
yield self.value
elif self.subs:
for sub in self.subs:
for val in sub:
yield val
Now we first initialize a dictionary for every element in your list:
vals = set(x for tup in array for x in tup)
and create a dictionary for every element in vals
that maps to a Merge
:
dic = {val:Merge(val) for val in vals}
and the merge_heads
:
merge_heads = set(dic.values())
Now for each tuple in the array, we lookup the corresponding Merge
object that is the ancestor, we create a new Merge
on top of that, remove the two old heads from the merge_head
set and add the new merge
to it:
for frm,to in array:
mra = dic[frm].get_ancestor()
mrb = dic[to].get_ancestor()
mr = Merge(subs=(mra,mrb))
mra.parent = mr
mrb.parent = mr
merge_heads.remove(mra)
merge_heads.remove(mrb)
merge_heads.add(mr)
Finally after we have done that we can simply construct a set
for each Merge
in merge_heads
:
resulting_sets = [set(merge) for merge in merge_heads]
and resulting_sets
will be (order may vary):
[{1, 2, 3, 4}, {8, 10, 5}]
Putting it all together (without class
definition):
vals = set(x for tup in array for x in tup)
dic = {val:Merge(val) for val in vals}
merge_heads = set(dic.values())
for frm,to in array:
mra = dic[frm].get_ancestor()
mrb = dic[to].get_ancestor()
mr = Merge(subs=(mra,mrb))
mra.parent = mr
mrb.parent = mr
merge_heads.remove(mra)
merge_heads.remove(mrb)
merge_heads.add(mr)
resulting_sets = [set(merge) for merge in merge_heads]
This will worst case run in O(n2), but you can balance the tree such that the ancestor is found in O(log n) instead, making it O(n log n). Furthermore you can short-circuit the list of ancestors, making it even faster.