I have Network data structure like this:
data Network = Empty | Node Double [Network] [Network]
Where the Double represents the node label, the first list is the parents, and the 2nd list is the children of a given node. Assume all edges are directed from root to leaves, and there are no cycles.
I have an edge data structure like this:
data Edge = Edge (Network, Network)
where the edge is directed from a->b for Edge (a,b)
I have written a function meant to make a list of all the tree edges of the network. To further define this point, every and all parent (incoming) edges to a node that has multiple parents (a reticulation node) are not tree-edges. I am considering the outgoing edge from a reticulation node, and all other edges in the network, to be a tree edge.
I attempted to use list comprehension for this task like such:
getTreeEdges :: Network -> [Edge]
getTreeEdges (Node label ps cs) = nub (getTreeEdgesHelp (Node label (Empty:[]) cs))
getTreeEdgesHelp :: Network -> [Edge]
--leaf node case, children is empty list
getTreeEdgesHelp n@(Node _ ps []) = [Edge (p, n) | p <- ps ]
--reticulation node and root case, there is more than one parent or parent is empty(root)
getTreeEdgesHelp (Node _ ps cs) | ((length ps) > 1||ps==(Empty:[])) = (concat (map getTreeEdgesHelp cs))
--interal node case
getTreeEdgesHelp n@(Node _ ps cs) = ([Edge (p, n) | p <- ps ])++( concat (map getTreeEdgesHelp cs))
As you can see, the list comprehension is used to store the parent edges for leaves and for internal edges, while the root and reticulation nodes skip over adding edges to the list and just recurse. Leaves end the recursion.
The problem I have is that the edges stored do not contain all of the expected information. More specifically the parent information of the parent nodes disappears. The other information remains, the label and the children are unaffected.
So for example, say I store the edge (a,b)
. The network b
is stored exactly correctly. The network a
has itself (the root node) and all the descendant nodes present and labelled properly, but they all lack any entry for their parents, the parents to every node is []
.
Here is an example of a given input, output, and expected output with newick format (a,b)c for a node c that has children a and b. I also include the parent(s) of a given node in square brackets after the label. For full disclosure, here is how I define Show
for the Network
and the Edge
.
instance Show Network where
show Empty = "_"
show (Node a ps []) = (show a)++"["++printlabels ps++"]"
show (Node a ps [x,y]) = "("++show x++","++show y ++ ")"++(show a)++"["++printlabels ps++"]"
show (Node a ps [x]) = "("++show x++")"++(show a)++"["++printlabels ps++"]"
printlabels :: [Network] -> [Char]
printlabels [] = " "
printlabels (Empty:_) = "root"
printlabels ((Node label _ _):ns) = (show label)++","++(printlabels ns)
Here is an example tree I have generated. Because the labels are rather long, I have substituted out letters to label the internal nodes and integers to label the leaves.
(((4.0[c],((0.0[f])f[e,h],(1.0[g],2.0[g])g[e])e[c])c[h],(0.0[f])f[e,h])h[d],3.0[d])d[root]
Is the representation of the following graph.
Running the function described above gives:
[(d,h),(h,c),(c,4.0),(c,e),(f,0.0),(e,g),(g,1.0),(g,2.0),(d,3.0)]
with
instance Show Edge where
show ( Edge ((Node labelu _ _), (Node labelv _ _)) ) = "("++(show labelu)++","++(show labelv)++")"
When I query a single edge, say the one at position 5 (e,g)
and rewrite Show
for Edge
to
instance Show Edge where
show ( Edge (u, v) ) = "("++(show u)++"=1, 2="++(show v)++")"
I get (((1.0[ ],2.0[ ])g[ ],0.0[ ])e[ ]=1, 2=(1.0[g],2.0[g])g[e])
whereas the expected result would be (((1.0[g],2.0[g])g[e],(0.0[f])f[e,h])e[c]=1, 2=(1.0[g],2.0[g])g[e])
Another error I have noticed is that 0.0
is given as the sibling to g
which is not true. As a commenter pointed out, I am not changing any nodes when trying to get tree edges, I am just assigning the nodes to an appropriate edge pair and storing that in a list. You can also see the network I have input has correct parentage and structure.
What could be going on here? I am new to Haskell, am I using list comprehension wrong?