1

I have several large TermDocumentMatrices, which I'm trimming down to a more manageable size using the removeSparseTerms() function. One of the arguments I have to send this, of course, is sparse.

Because the TDMs are all quite different, I'd like to be able to base the value I pass to sparse on some measure of their sparsity. I can see this measure using the inspect() function, but for the life of me I can't find a way to extract this from the metadata. Is there a suitable function in tm that I just haven't found?

CrowsNose
  • 83
  • 1
  • 10
  • Sparsity is the number of zero-valued elements over the total number of elements in the matrix, so you can get it using `1-length(M$v)/(nrow(M)*ncol(M))`. – Scarabee Oct 25 '17 at 10:27
  • If you look at the source code of `tm:::print.TermDocumentMatrix`, you can see that the actual formula is `round((1 - length(x$v)/prod(dim(x))) * 100)` (which is essentially the same thing). – Scarabee Oct 25 '17 at 10:35
  • That's perfect, thank you! – CrowsNose Oct 25 '17 at 10:49

0 Answers0