2

I haven't been able to find this in the questions or an R package, hopefully straightforward.

Take two hypothetical genetic sequences:

Sequence A: ATG CGC AAC GTG GAG CAT
Sequence B: ATG GGC TAC GTG GAT CAA

I want to have R code to generate the percentage difference in single nucleotides between the two sequences (e.g. 15%).

Any thoughts? Thanks in advance.

easwee
  • 15,757
  • 24
  • 60
  • 83
Nick Crouch
  • 301
  • 3
  • 14

1 Answers1

0

If I understand your question correctly, then you just need to do a simple string comparsion. For example,

R> seq1 = c("A", "T", "G", "C", "G", "C", 
            "A", "A", "C", "G", "T", "G", 
            "G", "A", "G", "C", "A", "T")
R> seq2 = c("A", "T", "G", "G", "G", "C", 
            "T", "A", "C", "G", "T", "G", 
            "G", "A", "G", "C", "A", "A")
R> seq1 != seq2
 [1] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE  TRUE
R> sum(seq1 != seq2)/length(seq1)*100
[1] 16.67

To get your data in the above format, have a look at the strsplit function.

csgillespie
  • 59,189
  • 14
  • 150
  • 185