0

I have two phylogenetic trees which have the same topology (expect for branch lengths):

In R using ape:

t1 <- ape::read.tree(file="",text="(((HS:72,((CP:30,CL:30.289473923):32,RN:62):10):2,(CS:63,BS:63):11):5,LA:79);")
t2 <- ape::read.tree(file="",text="(((((CP:39,CL:39):29,RN:68):9,HS:77):5,(BS:63,CS:63):19):14,LA:96);")
> ape::all.equal.phylo(t1,t2,use.edge.length = F,use.tip.label = T)
[1] TRUE

I want to compute the mean branch lengths across the two but the problem is that although their topologies are identical the order at which their nodes are represented is not identical, and not all tree nodes are labeled tips so I don't think there's a simple join solution:

> head(tidytree::as_tibble(t1))
# A tibble: 6 x 4
  parent  node branch.length label
   <int> <int>         <dbl> <chr>
1     10     1          72   HS   
2     12     2          30   CP   
3     12     3          30.3 CL   
4     11     4          62   RN   
5     13     5          63   CS   
6     13     6          63   BS   
> tail(tidytree::as_tibble(t1))
# A tibble: 6 x 4
  parent  node branch.length label
   <int> <int>         <dbl> <chr>
1      8     8            NA NA   
2      8     9             5 NA   
3      9    10             2 NA   
4     10    11            10 NA   
5     11    12            32 NA   
6      9    13            11 NA   
> head(tidytree::as_tibble(t2))
# A tibble: 6 x 4
  parent  node branch.length label
   <int> <int>         <dbl> <chr>
1     12     1            39 CP   
2     12     2            39 CL   
3     11     3            68 RN   
4     10     4            77 HS   
5     13     5            63 BS   
6     13     6            63 CS   
> tail(tidytree::as_tibble(t2))
# A tibble: 6 x 4
  parent  node branch.length label
   <int> <int>         <dbl> <chr>
1      8     8            NA NA   
2      8     9            14 NA   
3      9    10             5 NA   
4     10    11             9 NA   
5     11    12            29 NA   
6      9    13            19 NA   

So it's not clear to me how I'd correspond between any pair of branch lengths in order to take their mean.

Any idea how to match them or reorder t2 according to t1?

Supposedly phytools' matchNodes method is meant for that but it doesn't seem like it's getting it right:

phytools::matchNodes(t1, t2,method = "descendants")
     tr1 tr2
[1,]   8   8
[2,]   9   9
[3,]  10  10
[4,]  11  11
[5,]  12  12
[6,]  13  13

At least I'd expect it to correspond the tips correctly, meaning:

dplyr::left_join(dplyr::filter(tidytree::as_tibble(t1),!is.na(label)) %>% dplyr::select(node,label) %>% dplyr::rename(t1.node=node),
+                  dplyr::filter(tidytree::as_tibble(t2),!is.na(label)) %>% dplyr::select(node,label) %>% dplyr::rename(t2.node=node))
Joining, by = "label"
# A tibble: 7 x 3
  t1.node label t2.node
    <int> <chr>   <int>
1       1 HS          4
2       2 CP          1
3       3 CL          2
4       4 RN          3
5       5 CS          6
6       6 BS          5
7       7 LA          7

But that's not happening.

Ultimately the information for matching is in these tree tibbles because they list the parents of each node, but practically using that information for matching the modes probably requires some recursive steps.

dan
  • 6,048
  • 10
  • 57
  • 125

1 Answers1

0

Seems like ape's makeNodeLabel using the md5sum as the method argument, which labels the internal nodes by the tip labels achieves that:

t1 <- ape::read.tree(file="",text="(((HS:72,((CP:30,CL:30.289473923):32,RN:62):10):2,(CS:63,BS:63):11):5,LA:79);")
t2 <- ape::read.tree(file="",text="(((((CP:39,CL:39):29,RN:68):9,HS:77):5,(BS:63,CS:63):19):14,LA:96);")

dplyr::left_join(tidytree::as_tibble(ape::makeNodeLabel(t1, method = "md5sum")) %>% dplyr::select(node,label) %>% dplyr::rename(t1.node=node),
                 tidytree::as_tibble(ape::makeNodeLabel(t2, method = "md5sum")) %>% dplyr::select(node,label) %>% dplyr::rename(t2.node=node))

Joining, by = "label"
# A tibble: 13 x 3
   t1.node label                            t2.node
     <int> <chr>                              <int>
 1       1 HS                                     4
 2       2 CP                                     1
 3       3 CL                                     2
 4       4 RN                                     3
 5       5 CS                                     6
 6       6 BS                                     5
 7       7 LA                                     7
 8       8 da5f57f0a757f7211fcf84c540d9531a       8
 9       9 9bffe86cf0a2650b6a3f0d494c0183a9       9
10      10 bcf7b41992a064acd2e3e66fee7fe2d4      10
11      11 d50e0698114c621b49322697267900b7      11
12      12 f0a8c7fa67831514e65cdadbc68c3d31      12
13      13 82ab4cf8ae4a4df14cf87a48dc2638e0      13
dan
  • 6,048
  • 10
  • 57
  • 125