0

I am trying to replace the strings with special characters using gsub. But I am running into error invalid regular expression '\bc++\b', reason 'Invalid use of repetition operators'.

df = data.frame("word"=c('c++', '.XLS','Java-prog'))

for i in nrow(df){
  df$new[i] <- gsub(paste0("\\b", df$word[i], "\\b"), "xx", df$new[i],ignore.case = T)
}

Actual code:

data = data.frame("word"=c('python', 'java'),
                  "description"=c('Java-script is a statically typed and Python py is a dynamically typed',
                                  'java is a programming language'), stringsAsFactors = FALSE)

ll <- as.list(data$word)
data$new <- data$description
for(i in seq_len(nrow(data))) for(j in seq_along(ll)) {
  data$new[i] <- gsub(paste0("\\b", ll[j], "\\b"), "url", data$new[i],ignore.case = T)
}

The expectation is to replace the values with xx.

Ana
  • 325
  • 2
  • 11
  • `+` and `.` are special characters in regular expressions, I suggest you google "regex" to learn a bit more about them. Short-term, you need to escape the plus as `"c\\+\\+"` and `"\\.XLS"`, or (perhaps better) use `gsub(..., fixed=TRUE)`. – r2evans Aug 20 '19 at 20:37
  • Another issue is that `i in nrow(df)` will only run the last row of `df`. I'm guessing you want `i in 1:nrow(df)` instead. – Hayden Y. Aug 20 '19 at 20:40
  • 1
    You do not need a loop for this, what is in `df$new`? How many items does `df$word` contain? – Wiktor Stribiżew Aug 20 '19 at 20:46
  • Your problem is also with the word boundaries: are you sure you want to use them? Show an example with `.XLS`. What you are trying to write is https://ideone.com/ytCLY5, but I doubt it is exactly what you need. – Wiktor Stribiżew Aug 20 '19 at 21:05
  • The actual data is 30k. I have a dataframe with words and descriptions. If any of the words in the word column exists in the description, I need to replace with a url. So, I added all the words to a list and looping over each of the descriptions. But it takes a lot of time as 30k words will be checked agianst each description. Is there a better way to do this? – Ana Aug 20 '19 at 23:26
  • I have edited the post to include the actual code I am trying to iterate over each description. If anyone knows a better way of doing this. Please let me know. – Ana Aug 20 '19 at 23:36
  • Ok, so does https://ideone.com/K6xfqR work for you? – Wiktor Stribiżew Aug 21 '19 at 23:50
  • Anna? So, did it work for you? – Wiktor Stribiżew Aug 28 '19 at 20:43

0 Answers0