how to compute cosine similarity between words for a large DocumentTermMatrix

Question

I have a large tdm, for which I need the cosine similarity for every term with every other term. Standard procedures are not helping as I am getting the following error.

 Error: cannot allocate vector of size 1162.4 Gb

Since I am a novice with parallel processing in R, I am unable to use it to get the job done.Below is a small dataset. Any help would be great.

 library(tm)
 data("crude")
 tdm <- TermDocumentMatrix(crude)

The ideal output needs to be as follows.

  Word   Related_Word  cosine_distance
  oil        opec                   0.5
  oil        spill                  0.3
   .....................................................
   .....................................................

@José - I tried the following code and it gave me an error as below. Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105 — NinjaR, Mar 13 '17 at 17:26
These posts should get you where you need to go. http://stackoverflow.com/questions/41721431/cosine-similarity-of-2-dtms-in-r/41723921#41723921, http://stackoverflow.com/questions/41721431/cosine-similarity-of-2-dtms-in-r/41723921#41723921, http://stackoverflow.com/questions/29750519/r-calculate-cosine-distance-from-a-term-document-matrix-with-tm-and-proxy — emilliman5, Mar 13 '17 at 17:26
Possible duplicate of [R: Calculate cosine distance from a term-document matrix with tm and proxy](http://stackoverflow.com/questions/29750519/r-calculate-cosine-distance-from-a-term-document-matrix-with-tm-and-proxy) — emilliman5, Mar 13 '17 at 17:27

how to compute cosine similarity between words for a large DocumentTermMatrix

0 Answers0