-11

I want to count the frequency of 4 letters at every position across strings. The letters are A, T, G, C

TGAGGTAGTAGTTTGTGCTGTTAT
TAGTAGTTTGTGCTGTTA
TGAGGTAGTAGTTTGTAC
TGAGAACTGAATTCCATAGG

desired output:

  Pos1  Pos2  Pos3  and so on. 
A 0     1
T 4     0
C 0     0
G 0     3

So far I have used an R package called Biostrings, which works, but I wonder if perl would do this?

user3741035
  • 2,455
  • 4
  • 15
  • 20
  • 4
    What have your tried so far? – IrishGeek82 Jun 18 '14 at 02:52
  • ... and why do you want to switch to Perl? – Ben Bolker Jun 18 '14 at 03:47
  • 2
    Perl will not *do it*, but you can *do it* in perl. But why? – ooga Jun 18 '14 at 03:49
  • 2
    In the past, such low quality questions would receive only at most a few down-votes but this here shows some unhealthy religious-like zeal. Let's examine if that's the community we want to be. OP may have a perfectly reasonable explanation why Perl is preferred language here. Let's keep an open mind, shall we. – Roman Luštrik Dec 04 '20 at 09:41

1 Answers1

7

For the record, for

x = "TGAGGTAGTAGTTTGTGCTGTTAT
TAGTAGTTTGTGCTGTTA
TGAGGTAGTAGTTTGTAC
TGAGAACTGAATTCCATAGG"

a Biostrings solution is

library(Biostrings)
consensusMatrix(DNAStringSet(strsplit(x, "\n")[[1]]))

which will be fast for millions of sequences.

Martin Morgan
  • 45,935
  • 7
  • 84
  • 112