Specifically, I need to count the frequencies of each vowel in each document: e and i as "high" vowels; a, o, and u as "low" vowels.
Is there a way the count the frequencies of certain letters in each document in a quanteda corpus in R?
So far, I have only encountered functions that operate on word or sentence level, like token_select()
or ntoken()
.
Any help is welcome. I considered a regex pattern, but I'm not sure how to apply it to each individual document in a Quanteda corpus and get a count from it.
Here is a minimum working example to play around with:
require(quanteda)
text1 <- "This is some gibberish for you."
text2 <- "Some more gibberish. Enjoy!"
text3 <- "Gibber, gibber, gibber away."
corp <- rbind(text1, text2, text3) %>%
quanteda::corpus()