3

How to visualize the complete alignment of two sequences?

library(Biostrings)
s1 <-DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAGAAGACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG")
s2 <-DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATGTTTCACTACTTCCTTTCGGGTAAGTGTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATCAAATATATAAATATATAAAAATATAATTTTCATCAAATATATAAAAATATAATTTTCATC")
pairwiseAlignment(s1,s2)

Output:

Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGT--TTTCAC---...CTTCACCAGCTCCCTGGCGGTAAGTTG-ATCAAAGG---AAACGCAAAGTTTTCAAG 
subject: [1] GTTTCACTACTTCCTTTCGGGTAAGTAAAT-ATATGTTTCACTACTTCCTTTCGGGTA...TATATAAATATATAAAAATATAATTTTCATCAAATATATAAAAATATAATTTTCATC 
score: -394.7115 

Here, only a part of alignment has been shown? Do you know of any existing functions that either plot or print the alignment?

Prradep
  • 5,506
  • 5
  • 43
  • 84

1 Answers1

11

You can find information and details on how to extract the aligned pattern and subject sequences under ?pairwiseAlignments.

Here is an example based on the sample data you provide:

  1. Store the pairwise alignment in a PairwiseAlignmentsSingleSubject object

    alg <- pairwiseAlignment(s1,s2)
    
  2. Extract the aligned pattern and subject sequences and merge them into a DNAStringSet object.

    seq <- c(alignedPattern(alg), alignedSubject(alg))
    
  3. You can access the full sequences with as.character

    as.character(seq)
    [1] "ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGT--TTTCAC--------TTCACCAGCTCCCTGGCGGTAAGTTGATC---AAAGG---AAACGCAAAGTTTTCAAGAAGACTTCACCAGCTCCCTGGCGGTAAGTTG-ATCAAAGG---AAACGCAAAGTTTTCAAG"
    [2] "GTTTCACTACTTCCTTTCGGGTAAGTAAAT-ATATGTTTCACTACTTCCTTTCGGGTAAGTGTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATCAA-ATATATAAATATATAAAAATATAATTTTCATCAAATATATAAAAATATAATTTTCATC"
    

    It seems that alignedPattern and alignedSubject were added to Biostrings very recently. Alternatively you can do

    seq <- c(aligned(pattern(alg)), aligned(subject(alg)))
    

    but note that this will trim globally aligned sequences (see details).

  4. There exists a nice R/Bioconductor package DECIPHER which offers a method to visualise XStringSet data in a web browser. It automatically adds colour-coding and a consensus sequence at the bottom. In your case, you would do

    library(DECIPHER)
    BrowseSeqs(seq)
    

    enter image description here

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Functions alignedPattern, alignedSubject are not available. – Prradep Sep 17 '18 at 13:00
  • @Prradep They are part of `Biostrings`. Are you having reasonably current R/BIoconductor versions? I'm using R 3.4.0 with `Biostrings_2.48.0`. – Maurits Evers Sep 17 '18 at 13:02
  • Thanks, I have R version 3.4.4; ‘Biostrings’ version 2.46.0. I will update them and check! BTW, do you know any possibility to find stop codons in the alignment? – Prradep Sep 17 '18 at 13:09
  • 1
    @Prradep It seems `alignedPattern` and `alignedSubject` were added fairly recently; I've made an edit to show an alternative. Concerning your second question, SO guidelines suggest you should close this question first by setting the green check mark next to the answer. Then open a new question. – Maurits Evers Sep 17 '18 at 13:16
  • I finally managed to get the system running for `DECIPHER` package. `BrowseSeqs` function works perfectly fine but only generates an external html file. I would like this to force to an image. Does this package have any functions for that? – Prradep Sep 21 '18 at 14:04