2

I am trying to do some basic text analysis. After installing the 'tidytext' package, I tried to unnest my data frame, but I keep getting an error. I assume there is some package I am missing, but I am not sure how to figure out which. Any suggestions appreciated.

#

library(dplyr)
library(tidytext)


#Import data  
  text <- read.csv("TextSample.csv", stringsAsFactors=FALSE)

  n= nrow(text)

  text_df <- tibble(line = 1:n, text = text)

   text_df %>%
    unnest_tokens(word, text)

> Error in is_corpus_df(corpus) : ncol(corpus) >= 2 is not TRUE

dput:

structure(list(line = 1:6, text = structure(list(text = c("furloughs", "Students do not have their books or needed materials ", "Working MORE for less pay", "None", "Caring for an immuno-compromised spouse", "being a mom, school teacher, researcher and professor" )), class = "data.frame", row.names = c(NA, -6L))), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
Matt
  • 7,255
  • 2
  • 12
  • 34
Susan Ray
  • 37
  • 3
  • 1
    Can you please provide a sample of `text_df` by using `dput(head(text_df))`? – Matt May 12 '20 at 19:31
  • > dput(head(text_df)) structure(list(line = 1:6, text = structure(list(text = c("furloughs", "Students do not have their books or needed materials ", "Working MORE for less pay", "None", "Caring for an immuno-compromised spouse", "being a mom, school teacher, researcher and professor" )), class = "data.frame", row.names = c(NA, -6L))), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame")) – Susan Ray May 13 '20 at 14:05

1 Answers1

4

Your column text is actually a dataframe within the dataframe text_df, so you are trying to apply unnest_tokens() to a dataframe, but it will only work if you apply it to an atomic vector (character, integer, double, logical, etc.).

To fix this, you can do:

library(dplyr)
library(tidytext)

text_df <- text_df %>% 
  mutate_all(as.character) %>% 
  unnest_tokens(word, text)

Edit:

dplyr now has the across function, so mutate_all would be replaced with:

text_df <- text_df %>% 
  mutate(across(everything(), ~as.character(.))) %>% 
  unnest_tokens(word, text)

Which gives you:

# A tibble: 186 x 2
   line  word     
   <chr> <chr>    
 1 1     c        
 2 1     furloughs
 3 1     students 
 4 1     do       
 5 1     not      
 6 1     have     
 7 1     their    
 8 1     books    
 9 1     or       
10 1     needed   
# ... with 176 more rows
Matt
  • 7,255
  • 2
  • 12
  • 34