I'm using tm
package to find associations between words in a text.
This is what I did (I'm also using tidytext
package)
book <- Corpus(VectorSource(c(part1,part2,part3,part4,part5)))
book <- tm_map(book, content_transformer(tolower))
book <- tm_map(book, removeNumbers)
book <- tm_map(book, removePunctuation)
book <- tm_map(book, stripWhitespace)
book <- tm_map(book, removeWords, stopwords("english"))
TDM_book <- TermDocumentMatrix(book)
book_tidy <- tidy(TDM_book)
When I check my final table there are words like informationare
but in the text there's noting like information are
in the text but lots of information this
and information that
.
How can I get rid of that "magic pasting"?
Best regards