Description
We downloaded a dataset for our research which contains hierarchical data. However the makers weren't consistent at all. For example sometimes we have something like:
term1:term2:term3:term4
wehereas in other cases we only have:
term4
Example data
As example let's look at this dataset:
data = [['root','test','coffee'],
['root', 'test', 'gains'],
['root','gains', 'coffee'],
['root','milk','bread']]
Now I want to write a code to decipher the complete hierarchy (or at least as good as possible) based on this data and just print the branches upto the end points:
root:test:gains:coffee
root:milk:bread
I'm pretty sure there is a quite simple trick to do this, however I haven't found one yet, what I tried is:
- Starting with the longest branch (doesn't matter in this case) and then adding new branches whenever I encountered terms that couldn't be fit in the starting branch.