This question stems directly from a previous one I asked here:
Phylogenetics in R: collapsing descendant tips of an internal node
((((11:0.00201426,12:5e-08,(9:1e-08,10:1e-08,8:1e-08)40:0.00403036)41:0.00099978,7:5e-08)42:0.01717066,(3:0.00191517,(4:0.00196859,(5:1e-08,6:1e-08)71:0.00205168)70:0.00112995)69:0.01796015)43:0.042592645,((1:0.00136179,2:0.00267375)44:0.05586907,(((13:0.00093161,14:0.00532243)47:0.01252989,((15:1e-08,16:1e-08)49:0.00123243,(17:0.00272478,(18:0.00085725,19:0.00113572)51:0.01307761)50:0.00847373)48:0.01103656)46:0.00843782,((20:0.0020268,(21:0.00099593,22:1e-08)54:0.00099081)53:0.00297097,(23:0.00200672,(25:1e-08,(36:1e-08,37:1e-08,35:1e-08,34:1e-08,33:1e-08,32:1e-08,31:1e-08,30:1e-08,29:1e-08,28:0.00099682,27:1e-08,26:1e-08)58:0.00200056,24:1e-08)56:0.00100953)55:0.00210137)52:0.01233888)45:0.01906982)73:0.003562205)38;
I have created this tree from a gene tree in R, using the midpoint
function to root it. Now I apply the following function to it:
drop_dupes <- function(tree,thres=1e-05){
tips <- which(tree$edge[,2] %in% 1:Ntip(tree))
toDrop <- tree$edge.length[tips] < thres
newtree <- drop.tip(tree,tree$tip.label[toDrop])
return(newtree)
}
Now my issue is that when I apply this function I do not get the tree I expected. The output when plotted looks like this:
However, if I write the tree out to a text file using write.tree
and then read it in again as a Newick string, when I then apply the drop_dupes
function above, I get a different tree, as shown in the below plot:
And this is what is confusing me. Why am I getting two different outputs when applying the same function to the very same tree, with the only difference being if it's read in or it's already a variable in R?
Edit: Here are some further details. The initial gene tree is as follows:
(B.weihenstephanensis.KBAB4.ffn:0.00136179,B.weihenstephanensisWSBC10204.ffn:0.00267375,(((B.cereus.NJW.ffn:0.00191517,(B.thuringiensis.HS181.ffn:0.00196859,(B.thuringiensis.Bt407.ffn:0.00000001,B.thuringiensis.chinensisCT43.ffn:0.00000001)0.879000:0.00205168)0.738000:0.00112995)0.969000:0.01796015,(B.cereus.FORC013.ffn:0.00000005,((B.thuringiensis.galleriaeHD29.ffn:0.00000001,(B.thuringiensis.kurstakiYBT1520.ffn:0.00000001,B.thuringiensis.YWC28.ffn:0.00000001)0.000000:0.00000001)0.971000:0.00403036,(B.cereus.ATCC14579.ffn:0.00201426,B.thuringiensis.tolworthi.ffn:0.00000005)0.000000:0.00000005)0.377000:0.00099978)0.969000:0.01717066)1.000000:0.04615485,(((B.cereus.FM1.ffn:0.00093161,B.cereus.FT9.ffn:0.00532243)0.990000:0.01252989,((B.cereus.AH187.ffn:0.00000001,B.cereus.NC7401.ffn:0.00000001)0.694000:0.00123243,(B.thuringiensis.finitimusYBT020.ffn:0.00272478,(B.cereus.ATCC10987.ffn:0.00085725,B.cereus.FRI35.ffn:0.00113572)0.994000:0.01307761)0.973000:0.00847373)0.972000:0.01103656)0.863000:0.00843782,((B.thuringiensis.9727.ffn:0.00202680,(B.cereus.03BB102.ffn:0.00099593,B.cereus.D17.ffn:0.00000001)0.741000:0.00099081)0.822000:0.00297097,(B.cereus.E33L.ffn:0.00200672,(B.cereus.S28.ffn:0.00000001,(B.thuringiensis.HD1011.ffn:0.00000001,(B.anthracis.Vollum1B.ffn:0.00000001,(B.anthracis.Turkey32.ffn:0.00000001,(B.anthracis.RA3.ffn:0.00099682,(B.anthracis.Pasteur.ffn:0.00000001,(B.anthracis.Larissa.ffn:0.00000001,(B.anthracis.Cvac02.ffn:0.00000001,(B.anthracis.CDC684.ffn:0.00000001,(B.anthracis.BFV.ffn:0.00000001,(B.anthracis.Ames.ffn:0.00000001,(B.anthracis.A16R.ffn:0.00000001,(B.anthracis.A0248.ffn:0.00000001,B.anthracis.A1144.ffn:0.00000001)0.000000:0.00000001)0.000000:0.00000001)0.000000:0.00000001)0.000000:0.00000001)0.000000:0.00000001)0.000000:0.00000001)0.000000:0.00000001)0.000000:0.00000001)0.000000:0.00000005)0.000000:0.00000005)0.956000:0.00200056)0.000000:0.00000006)0.805000:0.00100953)0.809000:0.00210137)0.957000:0.01233888)0.929000:0.01906982)1.000000:0.05586907);
I read it into R as tree1. I then use the following code:
#Function to label nodes and tips as sequential integers
sort.names <- function(tr){
tr$node.label<-(length(tr$tip.label) + 1):(length(tr$tip.label)+ tr$Nnode)
##some of these are tips, some are nodes, need to treat differently
tr$tip.label<-1:(length(tr$tip.label))
return(tr)
}
#Function to check if tree is rooted, and if it is not to use midpoint #rooting
rootCheck <- function(tree){
if(is.rooted(tree) == FALSE){
rootedTree <- midpoint(tree)
}
return(rootedTree)
}
#The above mentioned function to remove duplicate tips
drop_dupes <- function(tree,thres=1e-05){
tips <- which(tree$edge[,2] %in% 1:Ntip(tree))
toDrop <- tree$edge.length[tips] < thres
newtree <- drop.tip(tree,tree$tip.label[toDrop])
return(newtree)
}
#Use functions on tree
a <- rootCheck(tree1)
b <- sort.names(a)
c <- di2multi(b, tol = 1e-05)
d <- drop_dupes(c)
Now at this point if I plot tree d
, I will get the first plot above. However, if I write tree c
to a text file, then read it back in and and then use the drop_dupes
function on it, I will get the latter tree.
I have checked the newick file of tree c
against the Newick tree at the top of the page and it is definitely the same.