I'm struggling with TDM NA values to commit the clustering. Initially I've set:
titles.tdm <- as.matrix(TermDocumentMatrix(titles.cw, control = list(bounds = list(global = c(10,Inf)))))
titles.sc <- scale(na.omit(titles.tdm))
and got matrix of 418 terms and 6955 documents. At this point executing:
titles.km <- kmeans(titles.sc, 2)
throws
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
When I've decided to remove those values by:
titles.sf <- titles.sc[,colSums(titles.sc) > 0]
I've got matrix of 4695 documents, but applying the kmeans
function still throws this error. When I've viewed the titles.sf
variable there are still columns (docs) with NA values. I'm messed up and don't know what doing wrong. How to remove those documents?
Earlier, I've applied titles.cw <- titles.cc[which(str_trim(titles.cc$content) != "")]
where titles.cc
is pure Corpus object from tm
library class, to delete black documents. It probably worked, but my NA values are in documents which are not blank for sure.