6

The stringr package has helpful str_replace() and str_replace_all() functions. For example

mystring <- "one fish two fish red fish blue fish"

str_replace(mystring, "fish", "dog") # replaces the first occurrence
str_replace_all(mystring, "fish", "dog") # replaces all occurrences

Awesome. But how do you

  1. Replace the 2nd occurrence of "fish"?
  2. Replace the last occurrence of "fish"?
  3. Replace the 2nd to last occurrence of "fish"?
Ben
  • 20,038
  • 30
  • 112
  • 189
  • 2 is easy: `str_replace(mystring, 'fish$', 'dog')`. The others depend on how much you know about the string. Also, aside from slightly different regex interpretation, you're not really getting anything beyond `sub` and `gsub` here. – alistaire Apr 02 '16 at 03:22
  • 1
    Is it mandatory to use `str_replace()`? – rock321987 Apr 02 '16 at 03:43
  • @rock321987 yes, I would like an answer that uses `str_replace()` – Ben Apr 02 '16 at 03:47

3 Answers3

3

For the first and last, we can use stri_replace from stringi as it has the option

 library(stringi)
 stri_replace(mystring, fixed="fish", "dog", mode="first")
 #[1] "one dog two fish red fish blue fish"

 stri_replace(mystring, fixed="fish", "dog", mode="last")
 #[1] "one fish two fish red fish blue dog"

The mode can only have values 'first', 'last' and 'all'. So, other options are not in the default function. We may have to use regex option to change it.

Using sub, we can do the nth replacement of word

sub("^((?:(?!fish).)*fish(?:(?!fish).)*)fish", 
           "\\1dog", mystring, perl=TRUE)
#[1] "one fish two dog red fish blue fish"

Or we can use

 sub('^((.*?fish.*?){2})fish', "\\1\\dog", mystring, perl=TRUE)
 #[1] "one fish two fish red dog blue fish"

Just for easiness, we can create a function to do this

patfn <- function(n){
 stopifnot(n>1)
 sprintf("^((.*?\\bfish\\b.*?){%d})\\bfish\\b", n-1)
} 

and replace the nth occurrence of 'fish' except the first one which can be easily done using sub or the default option in str_replace

sub(patfn(2), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two dog red fish blue fish"
sub(patfn(3), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two fish red dog blue fish"
sub(patfn(4), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two fish red fish blue dog"

This should also work with str_replace

 str_replace(mystring, patfn(2), "\\1dog")
 #[1] "one fish two dog red fish blue fish"
 str_replace(mystring, patfn(3), "\\1dog")
 #[1] "one fish two fish red dog blue fish"

Based on the pattern/replacement mentioned above, we can create a new function to do most of the options

replacerFn <- function(String, word, rword, n){
 stopifnot(n >0)
  pat <- sprintf(paste0("^((.*?\\b", word, "\\b.*?){%d})\\b",
           word,"\\b"), n-1)
  rpat <- paste0("\\1", rword)
  if(n >1) { 
    stringr::str_replace(String, pat, rpat)
   } else {
    stringr::str_replace(String, word, rword)
    }
 }


 replacerFn(mystring, "fish", "dog", 1)
 #[1] "one dog two fish red fish blue fish"
 replacerFn(mystring, "fish", "dog", 2)
 #[1] "one fish two dog red fish blue fish"
 replacerFn(mystring, "fish", "dog", 3)
 #[1] "one fish two fish red dog blue fish"
 replacerFn(mystring, "fish", "dog", 4)
 #[1] "one fish two fish red fish blue dog"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I see it's easy to replace the nth occurrence with a new string, but how do you replace it with the same occurrence with an additional string? For example replace the 2nd fish with '#fish#'. I can't seem to figure it out as using \1 doesn't work like usual. I tried \\1\\1, but that doesn't work. – James Marquez Apr 13 '17 at 15:15
  • @JamesMarquez Try `sub("^((?:(?!fish).)*fish(?:(?!fish).)*)fish", "\\1#fish#", mystring, perl=TRUE)# [1] "one fish two #fish# red fish blue fish"` – akrun Apr 13 '17 at 15:19
  • Thank you @akrun. I'm trying to add html tags around my nth capture, but the match can be different. For example, I'm trying to add `` tags around my 2nd capture which could be either `(cat|dog)`. – James Marquez Apr 13 '17 at 15:21
  • @JamesMarquez Could you please post as a new question as I almost forgot about the context of this question :-) – akrun Apr 13 '17 at 15:25
  • Here's the link to the new question [Regexp Replace - Append String to Second Occurrence Using R's Sub](http://stackoverflow.com/questions/43396526/regexp-replace-append-string-to-second-occurrence-using-rs-sub) – James Marquez Apr 13 '17 at 15:42
  • As the variation 'nth to last' hasn't been answered yet I allowed myself to post a new question. Maybe you akrun, could help out here again?! https://stackoverflow.com/questions/58859811/replace-nth-to-last-occurrence-of-word-in-string-text – meier_flo Nov 14 '19 at 19:43
  • @meier_flo I posted a solution there – akrun Nov 14 '19 at 20:28
2

A useful answer depends a lot on the string and what you know about it. With regex, one option is to build a regex that matches the whole line, but in different pieces, so you can put the pieces you like back in:

str_replace(mystring, '(^.*?fish.*?)(fish)(.*?fish.*)', '\\1dog\\3')
# [1] "one fish two dog red fish blue fish"

where the \\1 and \\3 in the replacement match the first and third parentheses captured, respectively. Note the lazy (ungreedy) quantifiers *?, which are important so you don't overmatch.

You can do the same thing to match the third or fourth occurrence, of course:

str_replace(mystring, '(^.*?fish.*?fish.*?)(fish)(.*)', '\\1dog\\3')
# [1] "one fish two fish red dog blue fish"
str_replace(mystring, '(^.*?fish.*?fish.*?fish.*?)(fish)(.*?)', '\\1dog\\3')
# [1] "one fish two fish red fish blue dog"

This is not tremendously efficient, though. You can use quantifiers to repeat, but they make numbering the replacement groups a little confusing:

str_replace(mystring, '^((.*?fish.*?){3})(fish)(.*?)', '\\1dog\\4')
# [1] "one fish two fish red fish blue dog"

but if you make the repeated group non-capturing (?: ... ), it makes more sense:

str_replace(mystring, '^((?:.*?fish.*?){3})(fish)(.*?)', '\\1dog\\3')
# [1] "one fish two fish red fish blue dog"

All of this is a lot of regex, though. A simpler option (depending on the context and how much you like regex, I suppose) may be to use strsplit and then recombine, collapseing separately:

mystrlist <- strsplit(mystring, 'fish ')[[1]] # match the space so not the last "fish$"
paste0(c(mystrlist[1], 
         paste0(mystrlist[2:3], collapse = 'dog '), 
         mystrlist[4]), 
       collapse = 'fish ')
# [1] "one fish two dog red fish blue fish"

paste0(c(mystrlist[1:2], 
         paste0(mystrlist[3:4], collapse = 'dog ')), 
       collapse = 'fish ')
# [1] "one fish two fish red dog blue fish"

This doesn't work terribly well for the last word, of course, but the end-of-line regex token $ makes using str_replace (or just sub) really easy for that purpose:

sub('fish$', 'dog', mystring)
# [1] "one fish two fish red fish blue dog"

Bottom line: It depends a lot on the context what the best choice is, but there is not an extra parameter for which match to replace, sadly.

alistaire
  • 42,459
  • 4
  • 77
  • 117
0

stringr is designed to work on character vectors. It does not have functions which allow to play within a vector element with any great level of detail. But an easy approach is to split the string into a character vector of subsets, apply stringr functions on this vector (since this is what stringr is really good at), then join the vector back into a single string. These steps, of course, can be turned into a function.

This method can be applied whenever something needs to be done within an individual string.

For the example provided here, the suitable subsets are individual words.

So, to replace the nth element of a string:

library(stringr)

replace_function <- function(string, word, rword, n) {
  vec <- unlist(strsplit(string, " "))
  vec[str_which(vec, word)[n]] <- rword
  str_c(vec, collapse = " ")
}

replace_function(mystring, "fish", "dog", 1)
[1] "one dog two fish red fish blue fish"

replace_function(mystring, "fish", "dog", 2)
[1] "one fish two dog red fish blue fish"

To replace the nth from last element is easy by adding rev():

replace_end_function <- function(string, word, rword, n) {
  vec <- unlist(strsplit(string, " "))
  vec[rev(str_which(vec, word))[n]] <- rword
  str_c(vec, collapse = " ")
}

replace_end_function(mystring, "fish", "dog", 1)
[1] "one fish two fish red fish blue dog"

replace_end_function(mystring, "fish", "dog", 2)
[1] "one fish two fish red dog blue fish"

And to replace the nth element to the last element:

replace_end_function <- function(string, word, rword, n) {
  vec <- unlist(strsplit(string, " "))
  vec[str_which(vec, word)[n:length(str_which(vec, word))]] <- rword
  str_c(vec, collapse = " ")
}

replace_end_function(mystring, "fish", "dog", 1)
[1] "one dog two dog red dog blue dog"

replace_end_function(mystring, "fish", "dog", 2)
[1] "one fish two dog red dog blue dog"

replace_end_function(mystring, "fish", "dog", 3)
[1] "one fish two fish red dog blue dog"

replace_end_function(mystring, "fish", "dog", 4)
[1] "one fish two fish red fish blue dog"

Note that this answer does not use str_replace(), as the OP had asked, because, as the OP noted, str_replace() only works on the 1st element of a vector and str_replace_all() works on all of them. So they are not the most appropriate functions within the stringr package to answer this question: indexing with the result of str_which() is much more suitable (once the individual string has been split into a vector of strings of course).

prosoitos
  • 6,679
  • 5
  • 27
  • 41