I've been trying to apply unnest_tokens from tidytext in a dataframe column to generate common bigrams and trigrams. Theyre short texts from > 200 articles. They're also a column subset from a larger csv.
I've tried the following , to no avail:
1. setting stringsasfactors = FALSE
2. used unnest_, unnest_tokens_.
Example :
bookparagraphs.csv
a<- data.frame("texts" = bookparagraphs$text[1:10], stringsAsFactors = FALSE)
str(a)
'data.frame': 10 obs. of 1 variable:
$ text: Factor w/ 6552 levels
Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.***
However, tm_map works wonderfully when I converted my texts > corpus > DTM etc . I'm able to count and review word co-occurrences just fine.
I'd like to get better at using tidytext, hence I'm looking to finding out how this works and where I went wrong.
Appreciate any suggestions ! Thank you.