I have a big dataframe with news articles. I have noticed that some of the articles have two words connected by a dot as the following examples shows The government.said it was important to quit.
. I will conduct some topic modelling, so I need to separate every single word.
This is the code I have used to separate those words
#String example
test <- c("i need.to separate the words connected by dots. however, I need.to keep having the dots separating sentences")
#Code to separate the words
test <- do.call(paste, as.list(strsplit(test, "\\.")[[1]]))
#This is what I get
> test
[1] "i need to separate the words connected by dots however, I need to keep having the dots separating sentences"
As you can see, I deleted all the dots (periods) on the text. How could I get the following outcome:
"i need to separate the words connected by dots. however, I need to keep having the dots separating sentences"
Final note
My dataframe is composed of 17.000 articles; all the text is on lowercase. I just provided a small example of the issue I am having when trying to separate two words connected by a dot. Additionally, is there any way I can use strsplit
on a list?