I have created a Quanteda corpus called readtext_corpus with 190 types of text. I would like to count the total number of tokens or words in the corpus. I tried the function ntoken which gives a number of words per text not the total number of words for all 190 texts.
Asked
Active
Viewed 1,010 times
1 Answers
2
you can just use the sum() function which is really simple. I left an example:
test <- c("testing string number 1","testing string number 2")
sum(quanteda::ntoken(test))
Result:
> quanteda::ntoken(test)
text1 text2
4 4
> sum(quanteda::ntoken(test))
[1] 8
>
In case you are using pipes, which is pretty common with quanteda
> quanteda::ntoken(test) %>% sum()
[1] 8

AugtPelle
- 549
- 1
- 10
-
Works for ntoken, doesn't work for ntype (when you want unique token count)... wish there was an easier way for this, they made this way too complicated - just have the ability to treat the corpus as a whole or as a a set of documents without having to recompile each case into separate objects. – Rich - enzedonline Apr 24 '22 at 09:19