1

I want find the captial letters in the each string and counting how many are there for each string for example

t = c("gctaggggggatggttactactGtgctatggactac", "gGaagggacggttactaCgTtatggactacT", "gcGaggggattggcttacG")  

ldply(str_match_all(t,"[A-Z]"),length)

when applying the above function my output is

1 4 2

But my desire output is

[1] G -1

[2] G -1 C -1 T -2

[3] G -2

eddi
  • 49,088
  • 6
  • 104
  • 155
dondapati
  • 829
  • 6
  • 18

2 Answers2

5

You can extract all capital letters and then compute the frequencies with table:

library(stringr)
lapply(str_extract_all(t, "[A-Z]"), table)
# [[1]]
# 
# G 
# 1 
# 
# [[2]]
# 
# C G T 
# 1 1 2 
# 
# [[3]]
# 
# G 
# 2 
talat
  • 68,970
  • 21
  • 126
  • 157
2

If you extend docendo's answer to be your exact requested format

lapply(stringr::str_extract_all(t, "[A-Z]"), 
       function(x) {
         x = table(x)
         paste(names(x), x, sep = "-")
       })

# [[1]]
# [1] "G-1"
# 
# [[2]]
# [1] "C-1" "G-1" "T-2"
# 
# [[3]]
# [1] "G-2"

and how i would do it in tidyverse

library(tidyverse)
data = data.frame(strings = c("gctaggggggatggttactactGtgctatggactac", "gGaagggacggttactaCgTtatggactacT", "gcGaggggattggcttacG"))
data  %>%
  mutate(caps_freq = stringr::str_extract_all(strings, "[A-Z]"),
         caps_freq = map(caps_freq, function(letter) data.frame(table(letter)))) %>%
  unnest()
#                                strings letters Freq
# 1 gctaggggggatggttactactGtgctatggactac       G    1
# 2      gGaagggacggttactaCgTtatggactacT       C    1
# 3      gGaagggacggttactaCgTtatggactacT       G    1
# 4      gGaagggacggttactaCgTtatggactacT       T    2
# 5                  gcGaggggattggcttacG       G    2
zacdav
  • 4,603
  • 2
  • 16
  • 37
  • What is the difference from @docendo's answer?? I don't see it (other than doing a `paste` at the end) - Same answer imo – Sotos Oct 17 '17 at 07:15
  • Because this is the requested output. As I quite clearly said, I extended his answer... – zacdav Oct 17 '17 at 07:21
  • The correct thing to do was to comment under his answer that an additional step can be added to accommodate the pasting part. Re-posting the same answer just to add a line of code sounds a bit like plagiarism. The `tidyverse` addition however, makes it OK as a new answer (Which I also like its output better to be honest) – Sotos Oct 17 '17 at 07:26