I have my vector as
dt <- c("1:7984985:A:G", "1:7984985-7984985:A:G", "1:7984985-7984985:T:G")
I would like to extract everything after 2nd :
.
The result I would like is
A:G , A:G, T:G
What would be the solution for this?
I have my vector as
dt <- c("1:7984985:A:G", "1:7984985-7984985:A:G", "1:7984985-7984985:T:G")
I would like to extract everything after 2nd :
.
The result I would like is
A:G , A:G, T:G
What would be the solution for this?
We can use sub
to match two instances of one or more characters that are not a :
([^:]+
) followed by :
from the start (^
) of the string and replace it with blank (""
)
sub("^([^:]+:){2}", "", dt)
#[1] "A:G" "A:G" "T:G"
It can be also done with trimws
(if it is not based on position)
trimws(dt, whitespace = "[-0-9:]")
#[1] "A:G" "A:G" "T:G"
Or using str_remove
from stringr
library(stringr)
str_remove(dt, "^([^:]+:){2}")
#[1] "A:G" "A:G" "T:G"
You can use sub
, capture the items you want to retain in a capturing group (...)
and refer back to them in the replacement argument to sub
:
sub("^.:[^:]+:(.:.)", "\\1", dt, perl = T)
[1] "A:G" "A:G" "T:G"
Alternatively, you can use str_extract
and positive lookbehind (?<=...)
:
library(stringr)
str_extract(dt, "(?<=:)[A-Z]:[A-Z]")
[1] "A:G" "A:G" "T:G"
Or simply use str_split which returns a list of 2 values. ´str_split("1:7984985:A:G", "\:",n=3)[[1]][3]´