Python: Infer item hierachy from branches

Question

Description
We downloaded a dataset for our research which contains hierarchical data. However the makers weren't consistent at all. For example sometimes we have something like:

term1:term2:term3:term4

wehereas in other cases we only have:

term4

Example data
As example let's look at this dataset:

data = [['root','test','coffee'],
        ['root', 'test', 'gains'],
        ['root','gains', 'coffee'],
        ['root','milk','bread']]

Now I want to write a code to decipher the complete hierarchy (or at least as good as possible) based on this data and just print the branches upto the end points:

root:test:gains:coffee
root:milk:bread

I'm pretty sure there is a quite simple trick to do this, however I haven't found one yet, what I tried is:

Starting with the longest branch (doesn't matter in this case) and then adding new branches whenever I encountered terms that couldn't be fit in the starting branch.

The way it is, it is not clearly defined what the rules are. For example, what should it do if it encounters both `a:b:c` and `a:c:b`? Should it just abort saying it is not possible? — zvone, Aug 30 '18 at 17:31
My dataset it too huge to know that beforehand but let's assume (and hope) that it's not possible @zvone — CodeNoob, Aug 30 '18 at 17:37
I think I would try to solve it using [C3 Linearization](https://en.wikipedia.org/wiki/C3_linearization) - the same mechanism which is used by Python for the MRO. It looks like the same type of problem. — zvone, Aug 30 '18 at 19:12

Python: Infer item hierachy from branches

0 Answers0