Count the number of tokens in a Documenttermmatrix

Question

I have a question to a Documenttermmatrix. I would like to use the "LDAVIS" package in R. To visualize my results of the LDA algorithm I need to calculate the number of tokens of every document. I don´t have the text corpus for the considered DTM. Does anyone know how I can calculate the amount of tokens for every Document. The output as a list with the document name and his amount of tokens would be the perfect solution.

Kind Regards, Tom

score 2 · Accepted Answer · answered Jun 21 '21 at 13:17

You can use slam::row_sums. This calculates the row_sums of a document term matrix without first transforming the dtm into a matrix. This function comes from the slam package which is installed when you install the tm package.

count_tokens <- slam::row_sums(dtm_goes_here)

# if you want a list
count_tokens_list <- as.list(slam::row_sums(dtm_goes_here))

Count the number of tokens in a Documenttermmatrix

1 Answers1