2

I have my vector as

dt <- c("1:7984985:A:G", "1:7984985-7984985:A:G", "1:7984985-7984985:T:G")

I would like to extract everything after 2nd :.

The result I would like is A:G , A:G, T:G

What would be the solution for this?

Yamuna_dhungana
  • 653
  • 4
  • 10

3 Answers3

1

We can use sub to match two instances of one or more characters that are not a : ([^:]+) followed by : from the start (^) of the string and replace it with blank ("")

sub("^([^:]+:){2}", "", dt)
#[1] "A:G" "A:G" "T:G"

It can be also done with trimws (if it is not based on position)

trimws(dt, whitespace = "[-0-9:]")
#[1] "A:G" "A:G" "T:G"

Or using str_remove from stringr

library(stringr)
str_remove(dt, "^([^:]+:){2}")
#[1] "A:G" "A:G" "T:G"
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You can use sub, capture the items you want to retain in a capturing group (...) and refer back to them in the replacement argument to sub:

sub("^.:[^:]+:(.:.)", "\\1", dt, perl = T)
[1] "A:G" "A:G" "T:G"

Alternatively, you can use str_extract and positive lookbehind (?<=...):

library(stringr)
str_extract(dt, "(?<=:)[A-Z]:[A-Z]")
[1] "A:G" "A:G" "T:G"
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
0

Or simply use str_split which returns a list of 2 values. ´str_split("1:7984985:A:G", "\:",n=3)[[1]][3]´