0

I have a question to a Documenttermmatrix. I would like to use the "LDAVIS" package in R. To visualize my results of the LDA algorithm I need to calculate the number of tokens of every document. I don“t have the text corpus for the considered DTM. Does anyone know how I can calculate the amount of tokens for every Document. The output as a list with the document name and his amount of tokens would be the perfect solution.

Kind Regards, Tom

Sylababa
  • 65
  • 4

1 Answers1

2

You can use slam::row_sums. This calculates the row_sums of a document term matrix without first transforming the dtm into a matrix. This function comes from the slam package which is installed when you install the tm package.

count_tokens <- slam::row_sums(dtm_goes_here)

# if you want a list
count_tokens_list <- as.list(slam::row_sums(dtm_goes_here))
phiver
  • 23,048
  • 14
  • 44
  • 56