2

I have created a Quanteda corpus called readtext_corpus with 190 types of text. I would like to count the total number of tokens or words in the corpus. I tried the function ntoken which gives a number of words per text not the total number of words for all 190 texts.

cd3091
  • 67
  • 7

1 Answers1

2

you can just use the sum() function which is really simple. I left an example:

test <- c("testing string number 1","testing string number 2")

sum(quanteda::ntoken(test))

Result:

> quanteda::ntoken(test)
text1 text2 
    4     4 
> sum(quanteda::ntoken(test))
[1] 8
> 

In case you are using pipes, which is pretty common with quanteda

> quanteda::ntoken(test) %>% sum()
[1] 8
AugtPelle
  • 549
  • 1
  • 10
  • Works for ntoken, doesn't work for ntype (when you want unique token count)... wish there was an easier way for this, they made this way too complicated - just have the ability to treat the corpus as a whole or as a a set of documents without having to recompile each case into separate objects. – Rich - enzedonline Apr 24 '22 at 09:19