I have a list of character vectors that hold tokens for documents.
list(doc1 = c("I", "like", "apples"), doc2 = c("You", "like", "apples", "too"))
I would like to transform this vector into a quanteda tokens
(or dfm
) object in order to make use of some of quantedas functionalities.
What's the best ay to do this?
I realize I could do something like the following for each document:
tokens(paste0(c("I", "like", "apples"), collapse = " "), what = "fastestword")
Which gives:
Tokens consisting of 1 document.
text1 :
[1] "I" "like" "apples"
But this feels like a hack and is also unreliable as I have whitespaces in some of my tokens objects. Is there a way to transfer these data structures more smoothly?