2

I want to replace just the exact terms in the data dataframe. In the example below, I am trying to replace the word java with xx but it replaces javascript as well as xxscript.

data$new
[1] "xxscript is a statically typed and xx py is a dynamically typed"
[2] "xx is a programming language" 
data = data.frame("word"=c('python', 'java'), 
                    "description"=c('Javascript is a statically typed and Python py is a dynamically typed',
                                    'java is a programming language'), stringsAsFactors = FALSE)

ll <- as.list(data$word)
data$new <- data$description
for(i in seq_len(nrow(data))) for(j in seq_along(ll)) {
    data$new[i] <- gsub(ll[j], "xx", data$new[i],ignore.case = T)
}
data$new

I am expecting only the exact terms to be replaced.

d.b
  • 32,245
  • 6
  • 36
  • 77
Ana
  • 325
  • 2
  • 11

2 Answers2

3

Use word boundaries \\b

gsub("\\bjava\\b", "xx", c("my java is", "this javascript is"))
#[1] "my xx is"           "this javascript is"

You probably want

ll <- as.list(data$word)
data$new <- data$description
for(i in seq_len(nrow(data))) for(j in seq_along(ll)) {
    data$new[i] <- gsub(paste0("\\b", ll[j], "\\b"), "xx", data$new[i],ignore.case = T)
}
d.b
  • 32,245
  • 6
  • 36
  • 77
1

You can remove the two loops by concatenate the word list with or | and sub is working with vectors:

data$new <- sub(paste0("\\b", ll, "\\b", collapse="|"), "xx", data$description, ignore.case = T)

To match words you can use boundaries \\b as @d-b already showed.

GKi
  • 37,245
  • 2
  • 26
  • 48