-1

I am beginner programmer in R.

I have "cCt/cGt" and I want to extract C and G and write it like C>G.

test ="cCt/cGt"
str_extract(test, "[A-Z]+$")
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
jean simon
  • 11
  • 1
  • 2

3 Answers3

4

Try this:

gsub(".*([A-Z]).*([A-Z]).*", "\\1>\\2", test )
[1] "C>G"

Here, we capture the two occurrences of the upper case letters in capturing groups given in parentheses (...). This enables us to refer to them (and only to them but not the rest of the string!) in gsub's replacement clause using backreferences \\1 and \\2. In the replacement clause we also include the desired >.

Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
0

You seem to look for a mutation in two concatenated strings, this function should solve your problem:

extract_mutation <- function(text){
  splitted <- strsplit(text, split = "/")[[1]] 
  pos <- regexpr("[[:upper:]]", splitted)
  uppercases <- regmatches(splitted, pos)
  mutation <- paste0(uppercases, collapse = ">") 
  return(mutation)
}

If the two base exchanges are always at the same index, you could also return the position if you're interested:

position <- pos[1]
return(list(mutation, position))

instead of the return(mutation)

clemenskuehn
  • 116
  • 8
0

You might also capture the 2 uppercase chars followed and preceded by optional lowercase characters and a / in between.

test ="cCt/cGt"
res = str_match(test, "([A-Z])[a-z]*/[a-z]*([A-Z])")
sprintf("%s>%s", res[2], res[3])

Output

[1] "C>G"

See an R demo.


An exact match for the whole string could be:

^[a-z]([A-Z])[a-z]/[a-z]([A-Z])[a-z]$
The fourth bird
  • 154,723
  • 16
  • 55
  • 70