0

I have a DocumentTermMatrix named train_dtm and I want to normalize the frequency counts of the term frequencies in all the documents. The problem I am facing is that the resulting matrix should also be of type DocumentTermMatrix because I want to pass the normalized matrix to another method LDA of the TopicModels package in R.

Below is the method I am using:

docs_dtm <- DocumentTermMatrix(docs)

Now, I want the rows of the above documenttermmatrix to be normalized. I even tried adding the control parameter via

docs_dtm <- DocumentTermMatrix(docs, control=list(weighting = function(x) weightTf(x, normalize=TRUE)))

but the above call throws an error saying

Error in weightTf(x, normalize=TRUE): unused argument (normalize = TRUE)

I have written the method to normalize the values of train_dtm using apply() method but it does not return a matrix of type DocumentTermMatrix.

Is there another way to accomplish the above task?

London guy
  • 27,522
  • 44
  • 121
  • 179

2 Answers2

0

Could you try passing the weighting argument directly, e.g.:

docs_dtm <- DocumentTermMatrix(docs, control = list(weighting = weightTf, normalize = TRUE))
Joshua Rosenberg
  • 4,014
  • 9
  • 34
  • 73
0

Normalize after creating the dtm:

docs_dtm_norm <- t(apply(docs_dtm, 1, function(x) x/sqrt(sum(x^2))))
Bsquare ℬℬ
  • 4,423
  • 11
  • 24
  • 44