0

I have a large tdm, for which I need the cosine similarity for every term with every other term. Standard procedures are not helping as I am getting the following error.

 Error: cannot allocate vector of size 1162.4 Gb

Since I am a novice with parallel processing in R, I am unable to use it to get the job done.Below is a small dataset. Any help would be great.

 library(tm)
 data("crude")
 tdm <- TermDocumentMatrix(crude)

The ideal output needs to be as follows.

  Word   Related_Word  cosine_distance
  oil        opec                   0.5
  oil        spill                  0.3
   .....................................................
   .....................................................
NinjaR
  • 621
  • 6
  • 22
  • Try the package quanteda, it uses sparse matrix. – José Mar 13 '17 at 15:52
  • @José - I tried the following code and it gave me an error as below. Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105 – NinjaR Mar 13 '17 at 17:26
  • These posts should get you where you need to go. http://stackoverflow.com/questions/41721431/cosine-similarity-of-2-dtms-in-r/41723921#41723921, http://stackoverflow.com/questions/41721431/cosine-similarity-of-2-dtms-in-r/41723921#41723921, http://stackoverflow.com/questions/29750519/r-calculate-cosine-distance-from-a-term-document-matrix-with-tm-and-proxy – emilliman5 Mar 13 '17 at 17:26
  • 1
    Possible duplicate of [R: Calculate cosine distance from a term-document matrix with tm and proxy](http://stackoverflow.com/questions/29750519/r-calculate-cosine-distance-from-a-term-document-matrix-with-tm-and-proxy) – emilliman5 Mar 13 '17 at 17:27

0 Answers0