0

I have a corpus filled with 5 different books (all .txt files). I want to calculate the cosine similarity between these books, so I can tell how similar they are with one another. Following is my code:

library(tm)
library(lsa)
library(SnowballC)
library(slam)
library(text2vec)
book_path = "C:/Users/Desktop/Books"
corpus <- Corpus(DirSource(book_path))
clean_corpus <- tm_map(corpus, content_transformer(tolower))
clean_corpus <- tm_map(clean_corpus, removePunctuation)
clean_corpus <- tm_map(clean_corpus, stripWhitespace)
clean_corpus <- tm_map(clean_corpus, removeWords, stopwords("english"))
clean_corpus <- tm_map(clean_corpus, stemDocument)
##make a term document matrix
tdm.tf <- TermDocumentMatrix(clean_corpus)
tdm.bin <- weightBin(tdm.tf)
tdm.tfidf <- weightTfIdf(tdm.tf, normalize = TRUE)
##calculate cos sim
text2vec.cos.sim <- sim2(tdm.tfidf, method = c("cosine"), norm = c("none"))

This is where i get :

    Error in sim2(tdm.tfidf, method = c("cosine"), norm = c("none")) : 
  inherits(x, "matrix") || inherits(x, "Matrix") is not TRUE

As you guys can see, so far I been relying on packages: TM and I want to use Text2vec's function to calculate the cosine similarity. Can anybody help?

Jimmy
  • 1
  • 1
  • can you please provide a complete working example (see instructions [here](https://stackoverflow.com/help/minimal-reproducible-example)), this e.g. means including the code to load the tm package – maja zaloznik May 20 '20 at 20:09
  • Hi thanks for replying, I just updated my code above that include the code loading the packages. Hope they are good now. – Jimmy May 20 '20 at 22:31
  • the packages are good, yes, but the example is still not working i.e. reproducible for anyone but you. there is presumably an issue with the corpus, and without access to it or a sample thereof it will be tricky for anyone to figure out what is going wrong. – maja zaloznik Jun 09 '20 at 10:42

1 Answers1

1

Try: tdm.tfidf<-as.matrix(tdm.tfidf) before running sim2

Sdae
  • 21
  • 3