0

I have the following string:

string <- c("ABDSFGHIJLKOP")

and list of substrings:

sub <- c("ABDSF", "SFGH", "GHIJLKOP")

I would like to include < and > after each sub match thus getting:

<ABD><SF><GH><GHIJKOP>

I have tried the following code by pattern matching over a list but as soon as ABDSF is matched SFGH is not recognised anymore because of the inclusion of the < > characters. Anybody have a better idea?

library(stringr)
library(dplyr)
library(magrittr)

string <- c("ABDSFGHIJLKOP")
sub <- c("ABDSF", "SFGH", "GHIJLKOP")

for (s in sub){

string %<>% str_replace_all(., s, paste0('<', s,'>'))
}

print(string)


Result: [1] "<ABDSF><GHIJLKOP>"

EDIT: The problem that I have with the above code is that as soon as the < > characters are inserted, after the first string match the second string SFGH is not recognised anymore because the string is now:

 <ABDSF>GHIJLKOP. 

So I am looking for a way to match the substrings ignoring the <> characters.

Nivel
  • 629
  • 4
  • 12
  • 1
    I don't understand how you get your expected output. Can you elaborate a bit more please? – Sotos Dec 28 '18 at 13:30
  • Possible duplicate of [Insert a character at a specific location in a string](https://stackoverflow.com/questions/13863599/insert-a-character-at-a-specific-location-in-a-string) – NelsonGon Dec 28 '18 at 13:31
  • 1
    Not exactly a duplicate of the above – Sotos Dec 28 '18 at 13:32
  • Look at the third last answer(by Zach Foster) in that question. Might help. – NelsonGon Dec 28 '18 at 13:46
  • Using the function in said question. You can use it as you wish: `insert_str(mystring,c("<",">"),c(0,4))` yields: SFGHIJLKOP" – NelsonGon Dec 28 '18 at 13:53
  • shouldn't the expected output be: `` ? – Shique Dec 28 '18 at 13:55
  • Yes. I just used a sample index. OP can index all values as they wish. Of course it will get tiresome for very long strings but it's a start. – NelsonGon Dec 28 '18 at 13:56
  • 1
    Ah I was actually referring to the OP, his expected output does not match his criteria, so I'm a bit confused. – Shique Dec 28 '18 at 14:05
  • im sorry that i can't write a code in r but you could try looping for each subString in your items (u are doing it) but concatenate the results without modifying in your stringe variable – lagripe Dec 28 '18 at 16:41

2 Answers2

3

Place [<>]* between successive characters in sub and then perform the substituations with those patterns. No packages are used.

# test input
string <- "ABDSFGHIJLKOP"
subs <- c("ABDSF", "SFGH", "GHIJLKOP")

pats <- paste0("(", gsub("(?<=[EF])(.)(?=.)", "\\1[<>]*", subs, perl = TRUE), ")")
s <- string
for(p in pats) s <- gsub(p, "<\\1>", s)
s
## [1] "<ABD<SF><GH>IJLKOP>"

Update

Regarding the comment below if I understand correctly we could add (?<=[EF]) giving:

pats <- paste0("(", gsub("(?<=[EF])(.)(?=.)", "\\1[<>]*", subs, perl = TRUE), ")")
s <- string
for(p in pats) s <- gsub(p, "<\\1>", s)
s
## [1] "<ABDSF><GHIJLKOP>"
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Anyway you could add an conditional statement? For instance only add <> when the character before is either a E or F? So "SFGH" will not be matched because the previous character is a D and not an E or F. – Nivel Jan 23 '19 at 16:52
  • Thanks a lot for your help it is really appreciated! The problem that I still have is that it doesn`t work if you for instance want to exclude "GHIJLKOP" by changing [EF] to [E]. Result = . I am looking for GHIJLKOP. – Nivel Jan 24 '19 at 18:00
0
#R version 3.3.2 

library(stringr)
library(magrittr)

string <- c("ABDSFGHIJLKOP")
sub <- c("ABDSF", "SFGH", "GHIJLKOP")
result <- c("")
for (s in sub){
temp<- c(str_extract(string, s))
if (!is.null(temp)) {
        temp<- paste("<",temp,">",sep = "")
        result <- paste(result,temp,sep = "")

    }
}
print(result)

Result :

[1] "<ABDSF><SFGH><GHIJLKOP>"

Tested in Rextester

lagripe
  • 766
  • 6
  • 18