0

I am trying to plot different phylogenetic trees from the same data using ggtree and ggplot2. However, the size of these individual trees is different, despite being constructed from the same tree data.

As I cannot share my research data, I am sharing a fake FASTA file generated using ChatGPT (free version, May 24) and aligned using Clustal Omega. (I confirm that this error occurs with my research data too)

>Sequence_1
-------------------------ACTCTAAAACGCGATGGCTCTCCTAATAGCATTG-
---GTCTGTGTCATCTCGC------------
>Sequence_2
---------------------TATGCTACAAGCAGTCAATTTAGGTATGTGGGGCCAGC-
------CGGGC----CGATCGA---------
>Sequence_3
----------------------GTTGAGGAGG--GTCCCTTAAAGGCGGGGGCGCAGATA
AGTCGTCATTACTT-----------------
>Sequence_4
--------------------ATAACAATCCAAAGTC-TCTCGTCT---TCATAG-GAAG-
---GTTTGGGGCTGCTGTG------------
>Sequence_5
---------------------CAAACAGAGTACAT--ATATTCATAGAATTGTGGTTTT-
---CTCAGTAATAAGTG--------------
>Sequence_6
------------AGACCGATTTGTTTAGGACG--GGGGCATGGTCG--------GAAGAG
TTCGGTCGCGA-------------------C
>Sequence_7
-----------------------------GTGCGTCGTGGTGTGCTCCCTTTCG--TCC-
---CTCGCTCGGTTGCGATTGGACA------
>Sequence_8
----------GCCCAGCCAGAGCTAGTCCATGCCGGCACTTGACTAGTGGTCCAGGAGT-
------C------------------------
>Sequence_9
-------CCATCACGGGTTC-GC--------------GACTGAACCCTCGATACCAA---
---TCCTAAGCCAGCTGT-------------
>Sequence_10
-----------CGATGTTAAAAGCTAAGTG----TCCGTTTAAAGCGATATC----TGA-
------TCAAC----CGCTC-----------
>Sequence_11
-GGGAGTGGGTAAAGAGACACATAGCATAGA---------TGTCCTA--A----------
-----ACGCAATCGGGC--------------
>Sequence_12
---------------------------GCGAGAACGTACGTGCGGTCTCGTCTAGTATG-
---CTGGATTATGTCAC-TCCG---------
>Sequence_13
---------------------ACGAGAGACGAAGC-----G--TAC---TTCTGTGGTG-
---CTGCGCAATTCCACCCTTGAAA------
>Sequence_14
-----------------------------ATCCCA--ATATGTGCCCAATGCTGTGTTC-
---TAGGATAATACGTCTCATTTCT------
>Sequence_15
-----------------------ATCAGAATTAGATATGAGAACATTTCGTGGA---TC-
---GTTTACCCTTGACCTGC-----------
>Sequence_16
--------------CCCGATAATTTCCGT--------------GAACACGTGACGCTCA-
---GAGTATGTTGTCCTAGACG---------
>Sequence_17
-----------------CCTCGAGTCAA-ACCGCCG-ATCGGATATACAGGTGG------
---CCCCGGTTCTTTTGC-------------
>Sequence_18
------CAGGTAACATTACGCATATCGCCTT---------GGAGTAA--AC---------
---CAGCTAAATCTCCGAT------------
>Sequence_19
--------------------CCTGTTGGATTTAATA-TCCCCACACACCAATGACAATG-
---GTCTGTATCAGA----------------
>Sequence_20
---------GCCCACCGGACTTCTTTGGTGCTCGCGTGATTAAATTATCGGG----ATT-
------CACA---------------------
>Sequence_21
-----------------------------------------ACCGCCAGCCGAGCATATA
TTATCGCCTAACAGGCGCGGGTATTAGGGAC
>Sequence_22
-----------------GCTGTCTGCAT-ATGCCCG-TTCAAGCACGGAGATGA------
---CCCGCCTATTGCTAG-------------
>Sequence_23
-----------------------------AAGTCATCTCGTGATATCCCGTGCGGGTTC-
---CAGCCGGCGAGAGGATCGGT--------
>Sequence_24
AGGATTGTGTGCATTCTAAA------------------------AGGACCCTATAAT---
---TTGTAAGGAACCGTAGC-----------
>Sequence_25
----AGTACAGAAGCCGTCACTCACAAGAGT---------CCACAGA--AC---------
---AAGGGTATCATGAA--------------
>Sequence_26
---ATGTGATTAACGGTAAG-TA--------------CGCTTAACCCT--------A---
---AGCGACACTCCCGACGGGC---------
>Sequence_27
--------CCACTCTCGGAC-------------------AT---TATTACCTTCAAT---
---CGCTGGCATAGCAGCAGGCATTC-----
>Sequence_28
---------AGCTACCACACATCGGCAGTGCATGATCACTAGAATCGATGGGGAAAAGT-
-------------------------------
>Sequence_29
-------------AGCCACTAAAGTCAGC--------------GATAACGTTCCTAATC-
---GAAGGGGACCTCAATCTA----------
>Sequence_30
--------------------CTAATTAGGGCAAACATTATTCCTATAGAATTAGGCCTG-
---GGTTGAGTGAG-----------------
>Sequence_31
--------CCGCACGCGTTC-CC--------------ACCT---GGGTAACTATTTG---
---ATCTGTTAGTCCACAGTGC---------
>Sequence_32
---CTGGGCCTAACCGGAAC-------------------CC---TCTTGCCGTCTCT---
---CGCCCAATTATCTAGTGT----------
>Sequence_33
---AGCGGATGGGCATTAGC------------------------AAAACGCTTTCAT---
---GTGGGTGCTACCGTGTCCAA--------
>Sequence_34
--------TGGGACTGCGAT-GC--------------TAGTGCACATTCTATCACAA---
---GACGACACCAACGTCG------------
>Sequence_35
------------------GACGATAGAGAATAACC-----GCGCCCGGATGTAGATGTG-
---CTGGGTAACTGACA--------------
>Sequence_36
---ATAGTCTCTTAGCGCAC-CC--------------AAAT---AGGTACCTTGTCG---
---GTCTATAGTATCAG--------------
>Sequence_37
----------------------TATTCCCCCCATCTCAACTGAGGCCTGTGCGA------
-----TTTGAATTCATATTTCTA--------
>Sequence_38
----------------------------------GTCACTGACACGCAGGTGCGCGAGCA
TACGATCCAGTCTGCCGCGTGAAT-------
>Sequence_39
---CATTTCTGTAATAGGCTGGTAACGTAAGGACTG-ACTCTACACCAAAATAA------
-------------------------------
>Sequence_40
----------------CTAGCTATTCATCCTTGGCACAGAAGCTCAGTTGCT----CTC-
------CCGAC----AGCGCG----------
>Sequence_41
---------------CATATATGCGCAAGTTTGGATACCTACACAGGAGTCGGGACAGC-
------CCGGC----C---------------
>Sequence_42
---------------------------CGGTTATATTTGTTGATATACCGTAGAGGTAC-
---CGCGTAACCCGATGTGAC----------
>Sequence_43
--------------GCGGGTCTTGCCAGCACAAACTATTCTGTTAGAGCCTCTGCTATA-
---CTCTT-----------------------
>Sequence_44
----TTTTCTCCACACCAAGGTCGTCAG-ATTTCCC-ACCACTCATCAAGCTGA------
---CG--------------------------
>Sequence_45
--TGACAGCACTAGCCGAATGAAATCACCGG---------CGTCGCA--A----------
-----GCGCAATACCCGT-------------
>Sequence_46
------TATTTTAGAGTTCGGATACG-AAGG---------GGACAAA--AC---------
---CAGCGTAAGCTGTTTAT-----------
>Sequence_47
-----------CACAGTAGTATGTGCCACACTTCTGTGAAGGATCAATTCGT--------
------CGGGC----ATGG------------
>Sequence_48
---------------TTTTTATAAGAAGTAAGC-AGCGGGTGACTCTATGAATCCGCAT-
------CCTAC----CG--------------
>Sequence_49
----------ACAGACTAATGTTTACTTCGAT--CTCGTTGGCGGGG-------------
------AGGCACTGCCGAATA----------
>Sequence_50
-----------GAGACTACCTTTTTCTGGAGA--CCATAACAGTACA-------------
------CTTCACTAGACCGTAT---------

In Rstudio, I am generating 4 types of trees using the package 'Phangorn' as follows:

fake_phyDat = read.phyDat("~/Path/fake_aligned.fasta", format = "fasta")
fake_dist_mat = dist.ml(fake_phyDat)
fake_NJ = NJ(fake_dist_mat)
fake_UPGMA = upgma(fake_dist_mat)
fake_ML = pml_bb(fake_phyDat, model = "JC", rearrangement = "NNI")$tree
fake_MP = optim.parsimony(fake_NJ, fake_phyDat)

Then I convert them to ggtree objects as follows:

fakeNJ = ggtree(fake_NJ, layout = "circ", branch.length = "none", col = "#3366CC")
fakeUPGMA = ggtree(fake_UPGMA, layout = "circ", branch.length = "none", col = "#2D718EFF")
fakeMP = ggtree(fake_MP, layout = "circ", branch.length = "none", col = "#20A386FF") 
fakeML = ggtree(fake_ML, layout = "circ", branch.length = "none", col = "#74D055FF") 

All the ggplot2 plots of these trees are plotted using the exact same code fashion, but I'm sharing them all.

FAKENJ = fakeNJ + geom_tiplab(alpha = 0.5, size = 2) +
  labs(title = "Neighbour Joining",subtitle = "fake") + 
  theme(#plot.margin = margin(20, 20, 20, 20), 
        plot.title = element_text(size = 14),
        plot.subtitle = element_text(12), 
  ) + 
  xlim(NA,25)

FAKEUPGMA = fakeUPGMA + geom_tiplab(alpha = 0.5, size = 2) +
  labs(title = "UPGMA",subtitle = "matk") + 
  theme(#plot.margin = margin(20, 20, 20, 20), 
        plot.title = element_text(size = 14),
        plot.subtitle = element_text(12), 
  ) + 
  xlim(NA,25)

FAKEMP = fakeMP + geom_tiplab(alpha = 0.5, size = 2) +
  labs(title = "Maximum Parsimony",subtitle = "fake") + 
  theme(#plot.margin = margin(20, 20, 20, 20), 
        plot.title = element_text(size = 14),
        plot.subtitle = element_text(12), 
  ) + 
  xlim(NA,25) 

FAKEML = fakeML + geom_tiplab(alpha = 0.5, size = 2) +
  labs(title = "Maximum Likelihood", subtitle = "fake, Model: JC") + 
  theme(#plot.margin = margin(20, 20, 20, 20), 
        plot.title = element_text(size = 14),
        plot.subtitle = element_text(12), 
  ) + 
  xlim(NA,25) 

fake = ggarrange(FAKENJ, FAKEUPGMA, FAKEMP, FAKEML, nrow = 2, ncol = 2)

Despite keeping everything similar, the trees are of different size, when compared side-by-side or even when saved as the individual images using the same ggsave command.

Fig 1

I am aware of using xlim in ggplot2, and I have tried adjusting it to a custom value for each tree. That does not solve the issue. Turning off the xlim blows up all the trees and makes them touch the edges of the graph, as shown in figure 2.

Fig 2

Also, why is ggarrange putting an empty space separating the right and left column?

Phil
  • 7,287
  • 3
  • 36
  • 66

0 Answers0