set consolidation for merging and flattening a tree structure

Question

I've got a set of data like this:

data = { 1: {"root": [2],
             "leaf": [10, 11, 12],
             },
         2: {"root": [1,3],
             "leaf": [13, 14, 15],
             },
         3: { "root": [2],
              "leaf": [16, 17],
            },
         4: {"root": [],
             "leaf": [17, 18, 19],
             },
         5: { "root": [],
              "leaf": [20, 21]
             },
       }

From this data, the initial key is a root node index, it contains a dictionary explaining which root nodes and leaf nodes are related to it.

I want to merge all indexes into related lists.

A root index connected by a root index, both/all root indexes and all leaf indexes are merged in the resulting list.
A root index may be connected to another root via a leaf, root indexes and all leaf indexes are merged in the resulting list.

I'm having a bit of trouble figuring out the best way to traverse and merge the data. From the above data set the Expected Output is:

[[1, 2, 3, 4, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [5, 20, 21]]

Fixed Attempt, seems to work, is there a more efficient method?

class MergeMachine(object):
    processed = []

    def merge(self, idx, parent_indexes, existing):
        if idx not in self.processed:
            parent_indexes.append(idx)
            if self.data[idx]["root"]:
                for related_root_idx in self.data[idx]["root"]:
                    if related_root_idx not in self.processed and related_root_idx not in parent_indexes:
                        existing.extend(self.merge(related_root_idx, parent_indexes, existing))
                        self.processed.append(related_root_idx)
            existing.append(idx)
            existing.extend(self.data[idx]["leaf"])
            self.processed.append(idx)
        return existing

    def process(self, data):
        results = []
        self.data = data
        for root_idx in self.data.keys():
            r = set(self.merge(root_idx, [], []))
            if r:
                combined = False
                for result_set in results:
                    if not r.isdisjoint(result_set):
                        result_set.union(r)
                        combined = True
                if not combined:
                    results.append(r)
        return results

mm = MergeMachine()
mm.process(data)

Is there a efficient way to merge the data into the expected output?

Because it is connected to the root 3 via the leaf 17, I guess. — Hyperboreus, Feb 20 '13 at 03:20
yes, that's what I was trying to express with, "A root index may be connected to another root via a leaf". — monkut, Feb 20 '13 at 03:28
thinking about it, it seems that if each node is flattened and then the disjoint check run, results could be joined in one pass without recursion? — monkut, Feb 20 '13 at 03:35
The phrase you should search for is "set consolidation", e.g. [here](http://rosettacode.org/wiki/Set_consolidation#Iterative). — DSM, Feb 20 '13 at 03:59

Hyperboreus · Accepted Answer · 2013-02-20T03:53:56.757

I have no idea if this is efficient, but it seems to work:

data = #your data as posted

data = [set ( [k] ) | set (v ['root'] ) | set (v ['leaf'] ) for k, v in data.items () ]
merged = []
while data:
    e0 = data [0]
    for idx, e in enumerate (data [1:] ):
        if e0 & e:
            data [idx + 1] = e | e0 #idx is off by 1 as I enumerate data [1:]
            break
    else: merged.append (e0)
    data = data [1:]

print (merged)

I guess that in a worst case scenario (i.e. no possible merge) the cost should be O(n**2). And it is serial without recursion.

score 1 · Answer 2 · answered Feb 20 '13 at 04:56

I came up with this, which is similar to, but not quite the same as the the one above. Mine's destructive, it consumes the input data structure, and I think it's bounded at the same point (On^2 in the event that none of the input data is related).

def merge(data):
  result = []
  while data:
    k, v = data.popitem()
    temp = set([k]) | set(v['root']) | set(v['leaf'])
    for idx, test in enumerate(result):
      if test & temp:
        result[idx] |= temp
        break
    else:
      result.append(temp)
  return result

set consolidation for merging and flattening a tree structure

2 Answers2