2

I have a vector of strings like this

test <- c("Dcl2_SsHV2L_2_CAAAAG_L003_R1_001", "Dcl2_SsHV2L_2_CAAAAG_L003_R1_001", 
"Dcl2_SsHV2L_2_CAAAAG_L003_R1_001", "Dcl1_SsHV2L_2_GGTAGC_L003_R1_001")

I need to delete everything after "SsHV2L" and get only

Dcl2_SsHV2L
Dcl2_SsHV2L
Dcl2_SsHV2L
Dcl1_SsHV2L

I tried: gsub("SsHV2L.*","",test)

what is the proper way of doing it?

MAPK
  • 5,635
  • 4
  • 37
  • 88

2 Answers2

3

You can just do

gsub("SsHV2L.+$", "SsHV2L", test)

Here you grab the "SsHV2L" where there is something after it and then just replace all of that with only "SsHV2L"

MrFlick
  • 195,160
  • 17
  • 277
  • 295
1
gsub("(^.+SsHV2L)(.+$)","\\1", test)
[1] "Dcl2_SsHV2L" "Dcl2_SsHV2L" "Dcl2_SsHV2L" "Dcl1_SsHV2L"

This uses a pattern that has two capture classes, the first on end in hte target string and then only that capture class is kept.

IRTFM
  • 258,963
  • 21
  • 364
  • 487