0

I'm using tm package to find associations between words in a text.

This is what I did (I'm also using tidytext package)

book <- Corpus(VectorSource(c(part1,part2,part3,part4,part5)))
book <- tm_map(book, content_transformer(tolower))
book <- tm_map(book, removeNumbers)
book <- tm_map(book, removePunctuation)
book <- tm_map(book, stripWhitespace)
book <- tm_map(book, removeWords, stopwords("english"))

TDM_book <- TermDocumentMatrix(book)

book_tidy <- tidy(TDM_book)

When I check my final table there are words like informationare but in the text there's noting like information are in the text but lots of information this and information that.

How can I get rid of that "magic pasting"?

Best regards

pachadotdev
  • 3,345
  • 6
  • 33
  • 60

0 Answers0