0

Im new with R, I try to do sentiment analysis using customer reviews using Random Forest.

Fo this I would like to use ngrams (bigrams and trigrams) as feautures (I used the quanteda R package quanteda package.

Here is the R code :

train <- Data_clean[train.index, ]
test <- Data_clean[test.index, ]
grams <- train$Reviews %>% tokens(ngrams = 1:3) %>% # generate tokens
  dfm  # generate dfm


#Compute tf-idf, inverse document frequency, and relative term frequency on document-feature matrices
tf.idf.ngrams <- tfidf(grams, normalize = FALSE, scheme = "inverse")


# train random forest classifier

ndsi.forest <- randomForest(tf.idf.ngrams[train.index, ], as.factor(train$Note.Reco[train.index]), ntree = 100)

But I get an error when building the Random Forest classifier :

> ndsi.forest <- randomForest(tf.idf.ngrams[train.index, ], as.factor(train$Note.Reco[train.index]), ntree = 100)
Error in t.default(x) : argument is not a matrix

Can you help me to resolve this error please?

Thank you

dr.nasri84
  • 79
  • 2
  • 9
  • 2
    if you convert the dfm to a matrix it should work `ndsi.forest <- randomForest(as.matrix(tf.idf.ngrams[train.index, ]), as.factor(train$Note.Reco[train.index]), ntree = 100)` – amatsuo_net May 10 '17 at 15:44
  • @amatsuo_net Thank you very much for you reply. Since I eextract a huge number of grams (from 1 to 4grams), I would like to know if is there a way to extract on best ngrams (for example the 500 best features)? Thank you – dr.nasri84 May 11 '17 at 12:05

0 Answers0