I built a corpus in R by the use of tm package. I want to change the frequency boundaries and only keep the words which are repeated at least 4 times in the entire document. After that, I need to build document-term-matrix based on these terms.
'Data' is a 45k by 2 matrix. First column is 'Text' which includes on average 10 words in each row. Second column is 'Code' which includes a 5-digit code for each row.
Almost 15k words in 'Text' are repeated once or twice. I want to remove them then build the document-term-matrix.
Here is the code I tried:
MyCorpus <- Corpus(VectorSource(Data$Text))
MyCorpus <- tm_map(MyCorpus , removeWords, stopwords('english'))
MyCorpus <- tm_map(MyCorpus , stripWhitespace)
MyCorpus <- termFreq(MyCorpus , control = list(local = c(4, Inf)))
But I faced this error in line 4:
Error: inherits(doc, "TextDocument") is not TRUE
What should I do?