1

I'm trying to search in sentences for both words (case insensitive) and punctuation symbols. The below function works well for words, but requires \\ to work for dots for example ; and thus it leads to unwanted behavior - see below:

fun <- function(text, search) {
  gsub(paste0("\\b(", search, ")\\b"), paste0("<mark>", '\\1', "</mark>"),
       text, ignore.case = T)
}
> fun("this is a test.", ".")
[1] "this<mark> </mark>is<mark> </mark><mark>a</mark><mark> </mark>test<mark>.</mark>"

> fun("(this is a test)", ")")
[1] "(this is a test<mark></mark>"

Expecting :

> fun("this is a test.", ".")
[1] "this is a test<mark>.</mark>"

> fun("(this is a test)", ")")
[1] "(this is a test<mark>)</mark>"

What is the best way - regular expression ? - to search for words as well as punctuation symbols in a string ?

Kamaloka
  • 81
  • 5
  • why are you not wanting to escape `.` and `)` to get it to work? – rawr May 17 '22 at 20:48
  • Mmh, I could make some if else statements, but it doesn't seem to work as wanted either : eg. ```fun <- function(text, search) { if (search=="."){ search<- "\\." } gsub(paste0("\\b(", search, ")\\b"), paste0("", '\\1', ""), text, ignore.case = T) } fun("this is a test. Yes, it is.", ".")``` results to ```[1] "this is a test. Yes, it is."``` and miss the first dot. – Kamaloka May 17 '22 at 21:03
  • just run `fun(yourtext, '\\.')` No need of `ifelse` – Onyambu May 17 '22 at 21:21
  • I would like it to be user friendly, i.e. typing a word marks the word, typing a ponctuation marks the ponctuation. And of course such that typing a ponctuation doesn't break all marks, as it is the case in the examples shown in my post. – Kamaloka May 17 '22 at 21:32

1 Answers1

0

You need

See the R code:

## Escaping function
regex.escape <- function(string) {
  gsub("([][{}()+*^$|\\\\?.])", "\\\\\\1", string)
}
fun <- function(text, search) {
  gsub(paste0("(?!\\B\\w)(", regex.escape(search), ")(?<!\\w\\B)"), "<mark>\\1</mark>",
       text, ignore.case = TRUE, perl=TRUE)
}
fun("this is a test.", ".")
# [1] "this is a test<mark>.</mark>"

fun("(this is a test)", ")")
# [1] "(this is a test<mark>)</mark>"
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • FYI: If you want to learn more about word boundaries, please consider seeing my YT video on [Dynamic adaptive word boundaries](https://www.youtube.com/watch?v=ngbxagE2b68) on YouTube – Wiktor Stribiżew May 17 '22 at 21:31