This is my sample text:
text = "First sentence. This is a second sentence. I like pets e.g. cats or birds."
I have a function which splits texts by sentence
library(stringi)
split_by_sentence <- function (text) {
# split based on periods, exclams or question marks
result <- unlist(strsplit(text, "\\.\\s|\\?|!") )
result <- stri_trim_both(result)
result <- result [nchar (result) > 0]
if (length (result) == 0)
result <- ""
return (result)
}
which actually splits by punctuation characters. This is the output:
> split_by_sentence(text)
[1] "First sentence" "This is a second sentence" "I like pets e.g" "cats or birds."
Is there a possibility to exclude special patterns like "e.g."?