I've recently been working on trying to find the word frequency within a single column in a data.frame
in R using the tm
package. While the data.frame
itself has many columns that are both numeric and character based, I'm only interested in a single column that is pure text. While I haven't had a problem cleaning up the text itself, as soon as I try to pull the word frequency with the findFreqTerms()
command, I get the following error:
Error: inherits(x, c("DocumentTermMatrix", "TermDocumentMatrix")) is not TRUE
I took this to say that I needed to convert my data into either a DocumentTermMatrix
or a TermDocumentMatrix
, however since I only have a single column that I'm working with, I also can't create either. Error below:
> Test <- DocumentTermMatrix(Types)
Error in UseMethod("TermDocumentMatrix", x) :
no applicable method for 'TermDocumentMatrix' applied to an object of class "c('PlainTextDocument', 'TextDocument')"
Is there any way to get a frequency count from the single column? I've pasted my full code below with explainations for each step I took. I appreciate any help you all can give me.
> # extracting the single column I wish to analyse from the data frame
Types <-Expenses$Types
> # lower all cases
Types <- tolower(Types)
> # remove punctuation
Types <- removePunctuation(Types)
> # remove numbers
Types <- removeNumbers(Types)
> # attempting to find word frequency
findFreqTerms(Types)
Error: inherits(x, c("DocumentTermMatrix", "TermDocumentMatrix")) is not TRUE