0

Update

I have a problem setting up my text classification using naive bayes. First I have 3 text files, two templates with good/bad words, one testing file. My TermDocumentMatrix is created and I also have a vector of rating, according my previous rating templates:

TDM   word1   word2   word3   word4 ...  rating
doc1    1       1       1                 good
doc2            1        1      1          bad
doc3 ...

The vector is not added to the TDM because I think cbindconverts the values to character. So I split the matrix into two parts:

template_train <- complete_TDM[1:(x+y),]
text_test <- data.matrix(complete_TDM[((x+y+1):nrow(complete_TDM)),])

where xis the number of rows of the good rating template and ythe bad one.

random <- sample(x+y)
template_train <- data.matrix(template_train[random,])   ###shuffle 
rating_vector <- as.factor(rating[random]) ###vector containing rating, shuffled the same way

Then I create a naiveBayes model:

naive_model <- naiveBayes(rating_vector~., x = template_train, y=rating_vector)

want to predict

prediction <- predict(naive_model, text_test)

But in the last step, I receive an error:

> prediction <- predict(naive_model, text_test)
Error in log(sapply(seq_along(attribs), function(v) { : 
  non-numeric argument to mathematical function

Thanks in advance!

Update

Ok I just solved the problem, I am now using data.matrixinstead of as.matrix and as.factorfor my rating vector, but now I have the problem, everything good is rated bad and vice versa.

> table(prediction, rating_vector)
          rating_vector
prediction bad good
      bad    0   95
      good  94    0
Community
  • 1
  • 1
wolf_wue
  • 296
  • 1
  • 15
  • can you show me the format of `text_test`? – Maxwell Chandler Feb 20 '17 at 18:50
  • should be the same as `template_train`, `class = matrix`, `is.numeric() = true` but I changed `naive_model <- naiveBayes(y = rating_vector, x=template_train, rating_vector~.)` and instead of `as.matrix`I took `data.matrix` – wolf_wue Feb 21 '17 at 08:27

1 Answers1

0

You can just use

text_test = data.frame(text_test)
prediction <- predict(naive_model, text_test)
Sarwan Ali
  • 151
  • 1
  • 1
  • 11