I have a corpus filled with 5 different books (all .txt files). I want to calculate the cosine similarity between these books, so I can tell how similar they are with one another. Following is my code:
library(tm)
library(lsa)
library(SnowballC)
library(slam)
library(text2vec)
book_path = "C:/Users/Desktop/Books"
corpus <- Corpus(DirSource(book_path))
clean_corpus <- tm_map(corpus, content_transformer(tolower))
clean_corpus <- tm_map(clean_corpus, removePunctuation)
clean_corpus <- tm_map(clean_corpus, stripWhitespace)
clean_corpus <- tm_map(clean_corpus, removeWords, stopwords("english"))
clean_corpus <- tm_map(clean_corpus, stemDocument)
##make a term document matrix
tdm.tf <- TermDocumentMatrix(clean_corpus)
tdm.bin <- weightBin(tdm.tf)
tdm.tfidf <- weightTfIdf(tdm.tf, normalize = TRUE)
##calculate cos sim
text2vec.cos.sim <- sim2(tdm.tfidf, method = c("cosine"), norm = c("none"))
This is where i get :
Error in sim2(tdm.tfidf, method = c("cosine"), norm = c("none")) :
inherits(x, "matrix") || inherits(x, "Matrix") is not TRUE
As you guys can see, so far I been relying on packages: TM and I want to use Text2vec's function to calculate the cosine similarity. Can anybody help?