-1

I am encountering an issue with applying the textstat_readability function to a DF column. Following several lines of cleaning tweet text (~ 53K observations), I apply the text_readability function to create a new column called $Flesch from the $cleantext column:

measlestweets_readability$read_flesch<-(textstat_readability(measlestweets_readability$cleantext,measure = "Flesch"))$Flesch

Error Message: The resulting error is: "Error in set(x, j = name, value = value) : Supplied 53380 items to be assigned to 53381 items of column 'read_flesch'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code."

What does this error code mean and how can I begin to resolve it? Thanks in advance.

1 Answers1

1

This is not a well-stated question, but I suspect I know the problem anyway. As of quanteda 2.0.1, textstat_readability omits results for empty documents.

library("quanteda")
## Package version: 2.0.1

textstat_readability(c("The cat in the hat", "", "Once upon a time."))
##   document  Flesch
## 1    text1 117.160
## 2    text3  97.025

So if your cleantext field includes an empty string document, then the result will not include it.

You can being to resolve it by creating a corpus from a doc_id variable in your original measlestweets_readability data.frame, and then using dplyr::leftjoin on that variable from the document field in the textstat output to merge in the Flesch column.

Ken Benoit
  • 14,454
  • 27
  • 50