easy way to extract uppercase in string in R

Question

I am beginner programmer in R.

I have "cCt/cGt" and I want to extract C and G and write it like C>G.

test ="cCt/cGt"
str_extract(test, "[A-Z]+$")

So, you have a `cCt/cGt` string, what do you need as output? A `cGt/cGt` string? — Wiktor Stribiżew, Jun 04 '22 at 11:38

Chris Ruehlemann · Answer 1 · 2022-06-04T11:39:56.317

4

Try this:

gsub(".*([A-Z]).*([A-Z]).*", "\\1>\\2", test )
[1] "C>G"

Here, we capture the two occurrences of the upper case letters in capturing groups given in parentheses (...). This enables us to refer to them (and only to them but not the rest of the string!) in gsub's replacement clause using backreferences \\1 and \\2. In the replacement clause we also include the desired >.

edited Jun 04 '22 at 11:39

answered Jun 04 '22 at 11:34

Chris Ruehlemann

20,321
4
12
34

clemenskuehn · Accepted Answer · 2022-06-04T12:03:43.320

You seem to look for a mutation in two concatenated strings, this function should solve your problem:

extract_mutation <- function(text){
  splitted <- strsplit(text, split = "/")[[1]] 
  pos <- regexpr("[[:upper:]]", splitted)
  uppercases <- regmatches(splitted, pos)
  mutation <- paste0(uppercases, collapse = ">") 
  return(mutation)
}

If the two base exchanges are always at the same index, you could also return the position if you're interested:

position <- pos[1]
return(list(mutation, position))

instead of the return(mutation)

score 0 · Answer 3 · answered Jun 04 '22 at 11:52

You might also capture the 2 uppercase chars followed and preceded by optional lowercase characters and a / in between.

test ="cCt/cGt"
res = str_match(test, "([A-Z])[a-z]*/[a-z]*([A-Z])")
sprintf("%s>%s", res[2], res[3])

Output

[1] "C>G"

See an R demo.

An exact match for the whole string could be:

^[a-z]([A-Z])[a-z]/[a-z]([A-Z])[a-z]$

easy way to extract uppercase in string in R

3 Answers3