How to extract string after 2nd delimiter in R

Question

I have my vector as

dt <- c("1:7984985:A:G", "1:7984985-7984985:A:G", "1:7984985-7984985:T:G")

I would like to extract everything after 2nd :.

The result I would like is A:G , A:G, T:G

What would be the solution for this?

score 1 · Accepted Answer · answered Jul 06 '20 at 20:34

We can use sub to match two instances of one or more characters that are not a : ([^:]+) followed by : from the start (^) of the string and replace it with blank ("")

sub("^([^:]+:){2}", "", dt)
#[1] "A:G" "A:G" "T:G"

It can be also done with trimws (if it is not based on position)

trimws(dt, whitespace = "[-0-9:]")
#[1] "A:G" "A:G" "T:G"

Or using str_remove from stringr

library(stringr)
str_remove(dt, "^([^:]+:){2}")
#[1] "A:G" "A:G" "T:G"

Chris Ruehlemann · Answer 2 · 2020-07-06T22:00:59.753

1

You can use sub, capture the items you want to retain in a capturing group (...) and refer back to them in the replacement argument to sub:

sub("^.:[^:]+:(.:.)", "\\1", dt, perl = T)
[1] "A:G" "A:G" "T:G"

Alternatively, you can use str_extract and positive lookbehind (?<=...):

library(stringr)
str_extract(dt, "(?<=:)[A-Z]:[A-Z]")
[1] "A:G" "A:G" "T:G"

edited Jul 06 '20 at 22:00

answered Jul 06 '20 at 21:01

Chris Ruehlemann

20,321
4
12
34

score 0 · Answer 3 · answered Jul 06 '20 at 21:09

0

Or simply use str_split which returns a list of 2 values. ´str_split("1:7984985:A:G", "\:",n=3)[[1]][3]´

answered Jul 06 '20 at 21:09

Psyndrom Ventura

131
6

How to extract string after 2nd delimiter in R

3 Answers3

Linked