I have a large tdm, for which I need the cosine similarity for every term with every other term. Standard procedures are not helping as I am getting the following error.
Error: cannot allocate vector of size 1162.4 Gb
Since I am a novice with parallel processing in R, I am unable to use it to get the job done.Below is a small dataset. Any help would be great.
library(tm)
data("crude")
tdm <- TermDocumentMatrix(crude)
The ideal output needs to be as follows.
Word Related_Word cosine_distance
oil opec 0.5
oil spill 0.3
.....................................................
.....................................................