0

I'm trying to read a Newick-format tree into R. My tree file looks like this:

("Duplodnaviria":("Heunggongvirae":("Peploviricota":("Herviviricetes":("Herpesvirales":("Alloherpesviridae":(),"Herpesviridae":()))),"Uroviricota":("Caudoviricetes":("Caudovirales":("Herelleviridae":(),"Myoviridae":()))))),"Monodnaviria":("Loebvirae":("Hofneiviricota":("Faserviricetes":("Tubulavirales":("Inoviridae":())))),"Shotokuvirae":("Cossaviricota":("Mouviricetes":("Polivirales":("Bidnaviridae":())),"Papovaviricetes":("Sepolyvirales":("Polyomaviridae":()),"Zurhausenvirales":("Papillomaviridae":())),"Quintoviricetes":("Piccovirales":("Parvoviridae":()))),"Cressdnaviricota":("Arfiviricetes":("Cirlivirales":("Circoviridae":()),"Mulpavirales":("Nanoviridae":())),"Repensiviricetes":("Geplafuvirales":("Geminiviridae":(),"Genomoviridae":()))))),"Riboviria":("Orthornavirae":("Duplornaviricota":("Chrymotiviricetes":("Ghabrivirales":("Megabirnaviridae":(),"Quadriviridae":(),"Totiviridae":())),"Resentoviricetes":("Reovirales":("Reoviridae":())),"Vidaverviricetes":("Mindivirales":("Cystoviridae":()))),"Kitrinoviricota":("Alsuviricetes":("Hepelivirales":("Alphatetraviridae":(),"Benyviridae":(),"Hepeviridae":(),"Matonaviridae":()),"Martellivirales":("Bromoviridae":(),"Closteroviridae":(),"Endornaviridae":(),"Kitaviridae":(),"Mayoviridae":(),"Togaviridae":(),"Virgaviridae":()),"Tymovirales":("Alphaflexiviridae":(),"Betaflexiviridae":(),"Tymoviridae":())),"Flasuviricetes":("Amarillovirales":("Flaviviridae":())),"Magsaviricetes":("Nodamuvirales":("Nodaviridae":())),"Tolucaviricetes":("Tolivirales":("Tombusviridae":()))),"Lenarviricota":("Amabiliviricetes":("Wolframvirales":("Narnaviridae":())),"Howeltoviricetes":("Cryppavirales":("Mitoviridae":())),"Miaviricetes":("Ourlivirales":("Botourmiaviridae":()))),"Negarnaviricota":("Ellioviricetes":("Bunyavirales":("Arenaviridae":(),"Fimoviridae":(),"Hantaviridae":(),"Nairoviridae":(),"Peribunyaviridae":(),"Phenuiviridae":(),"Tospoviridae":())),"Insthoviricetes":("Articulavirales":("Amnoonviridae":(),"Orthomyxoviridae":())),"Milneviricetes":("Serpentovirales":("Aspiviridae":())),"Monjiviricetes":("Mononegavirales":("Bornaviridae":(),"Filoviridae":(),"Nyamiviridae":(),"Paramyxoviridae":(),"Pneumoviridae":(),"Rhabdoviridae":()))),"Pisuviricota":("Duplopiviricetes":("Durnavirales":("Amalgaviridae":(),"Hypoviridae":(),"Partitiviridae":(),"Picobirnaviridae":())),"Pisoniviricetes":("Nidovirales":("Coronaviridae":(),"Mesoniviridae":(),"Roniviridae":(),"Tobaniviridae":()),"Picornavirales":("Caliciviridae":(),"Dicistroviridae":(),"Iflaviridae":(),"Picornaviridae":(),"Secoviridae":()),"Sobelivirales":("Solemoviridae":())),"Stelpaviricetes":("Patatavirales":("Potyviridae":()),"Stellavirales":("Astroviridae":())))),"Pararnavirae":("Artverviricota":("Revtraviricetes":("Blubervirales":("Hepadnaviridae":()),"Ortervirales":("Caulimoviridae":(),"Metaviridae":(),"Pseudoviridae":(),"Retroviridae":()))))),"Varidnaviria":("Bamfordvirae":("Nucleocytoviricota":("Megaviricetes":("Algavirales":("Phycodnaviridae":()),"Imitervirales":("Mimiviridae":()),"Pimascovirales":("Ascoviridae":(),"Iridoviridae":(),"Marseilleviridae":())),"Pokkesviricetes":("Asfuvirales":("Asfarviridae":()),"Chitovirales":("Poxviridae":()))),"Preplasmiviricota":("Maveriviricetes":("Priklausovirales":("Lavidaviridae":())),"Tectiliviricetes":("Rowavirales":("Adenoviridae":()))))));

When I read this into R with ape::read.tree(), it appears to correctly read in the structure of the tree, but it doesn't import any of the tip labels:

> ape::read.tree(file="res_4.tree")

Phylogenetic tree with 84 tips and 179 internal nodes.

Tip labels:
  , , , , , , ...

Unrooted; includes branch lengths.

When I read in with read.newick(), it doesn't import the structure of the tree, and while it does import tip labels, those tip labels are wrong (lots of the tip labels are '...icota' or '...virales', but every tip is a family, and so all of the tip names should end in '...idae').

> phytools::read.newick(file="res_4.tree")
Read 1 item

Phylogenetic tree with 84 tips and 1 internal nodes.

Tip labels:
  "Duplodnaviria", "Herpesviridae", "Uroviricota", "Myoviridae", "Monodnaviria", "Shotokuvirae", ...

Unrooted; includes branch lengths.
There were 50 or more warnings (use warnings() to see the first 50)


> warnings()
Warning messages:
1: In getEdgeLength(text, i) : NAs introduced by coercion
2: In getEdgeLength(text, i) : NAs introduced by coercion
3: In getEdgeLength(text, i) : NAs introduced by coercion
4: In getEdgeLength(text, i) : NAs introduced by coercion

I've tried playing around with the Newick tree (replacing " with '; switching between { and (; copying + pasting instead of reading in as a file, etc). Is there an issue I'm missing with the structure of the Newick tree?

Thank you!

  • This doesn't quite look like the newick-formatted text I'm familiar with. I'd expect commas where I see colons, which usually signify that an edge weight is about to follow. And are the empty `()` intentional? – Martin Smith Feb 21 '22 at 14:18
  • 1
    Aha - it imports OK now after replacing the commas with colons. The empty brackets were not intentional - I have the phylogenetic data in dataframe form, but I've converted into Newick for visualising, running asrs etc. Now the tree is plotting every taxon level as a tip ([phylogeny as visualised](https://drive.google.com/file/d/1fzkdKa_-WiMreJo2sA4jxbfj3HPU0Flv/view);[actual phylogeny](https://drive.google.com/file/d/1YpcLZcdSa46iNKB8j7qiOZDKKb9EMnfZ/view)). Perhaps the Newick is badly formatted, and time to revisit the script I'm using to convert from dataframe format? – bioinfo1 Feb 21 '22 at 17:21

1 Answers1

0

I had a similar problem. I think this is due to the linear structure of the tree. I made those tests:

ape::read.tree(text = "(((seq2:1)seq3:1)seq1:30)Ancestor:0;" )  #no tips

I added an artificial sequence to create a bifurcation in the tree, and it worked:

ape::read.tree(text = "(((seq2:1, seqX:100)seq3:1)seq1:30)Ancestor:0;" ) #works

I don't know if there is a way to change the code to correct this bug at the source, I'm sorry!

I have this solution, which is not very elegant: I add the artificial sequence with a regex operation, and run:

ape::drop.tip(Tree_ape, tip = c("seqX"), trim.internal = F, collapse.singles = F)

I get a tree, with only one tip, and all the internal nodes: Tip labels: seq2 Node labels: Ancestor, seq1, seq3

It could be worth adding ape in the tags, since the bug may come from this package.

I hope this will help!

eglh
  • 17
  • 2