0

I'd like to build a pandas dataframe or tuple from an anytree object, where each node has a list attribute of members:

from anytree import Node, RenderTree, find_by_attr
from anytree.exporter import DictExporter
from collections import OrderedDict
import pandas as pd
import numpy as np

tree = Node('T0C0',
        n=1000,
        tier=0,
        members=['A','B','C','D'])

Node('T0C0.T1C0',
     parent=find_by_attr(tree, 'T0C0'),
     n=400,
     tier=1,
     members=['B','C'])

Node('T0C0.T1C1',
     parent=find_by_attr(tree, 'T0C0'),
     n=600,
     tier=1,
     members=['A','D'])

Node('T0C0.T1C1.T2C0',
     parent=find_by_attr(tree, 'T0C0.T1C1'),
     n=300,
     tier=2,
     members=['D'])

Node('T0C0.T1C1.T2C1',
     parent=find_by_attr(tree, 'T0C0.T1C1'),
     n=300,
     tier=2,
     members=['A'])

my goal is to produce a dataframe of end-nodes per member, or, even better, tier membership per column like the following:

pd.DataFrame(data=np.array([['T0C0.T1C1.T2C1','T0C0.T1C0','T0C0.T1C0','T0C0.T1C1.T2C0'],
                           ['T0C0','T0C0','T0C0','T0C0'],
                           ['T0C0.T1C1','T0C0.T1C0','T0C0.T1C0','T0C0.T1C1'],
                           ['T0C0.T1C1.T2C1',None,None,'T0C0.T1C1.T2C0']]
                          ),
             index=['A','B','C','D'],columns=['EndCluster','tier0','tier1','tier2'])

I've tried exporting to ordereddict and to json and building data frames directly from there, but "children" becomes a column in the resulting dataframe, with ordered dict entries. I cannot find a way to unnest. Thank you for any help!

1 Answers1

0

The answer turned out easier than I thought. First grab all the end nodes using anytree's findall()

endnodes = anytree.findall(tree, filter_=lambda node: len(node.children)==0)  

This returns a list of nodes, easier to work with in this case than anytree's OrderedDict conversion

Finally, populate the dataframe by multiplying member-level attributes by len(member)

members = []
tier = []
endcluster = []
for item in endnodes:
    members += item.members
    tier += [item.tier] * len(item.members)
    endcluster += [item.name] * len(item.members)
endf = pd.DataFrame(index=members)
endf['tier']=tier
endf['endcluster']=endcluster