0

I have written a R script which successfully runs and predicts output but only when csv with multiple entries is passed as input to classifier.

training_set = read.csv('finaldata.csv')
library(randomForest)
set.seed(123)
classifier = randomForest(x = training_set[-5],
                      y = training_set$Song,
                      ntree = 50)

test_set = read.csv('testSet.csv')
y_pred = predict(classifier, newdata = test_set)

Above code runs succesfully, but instead of giving 10+ inputs to classifier, I want to pass a data.frame as single input to this classifier. That works in other classifier except this, why? So following code doesn't work and throws error -

y_pred = predict(classifier, data.frame(Emot="happy",Pact="Walking",Mact="nothing",Session="morning"))

Error in predict.randomForest(classifier, data.frame(Emot = "happy", : Type of predictors in new data do not match that of the training data.

I even tried keeping single entry in testinput.csv, still throws same error! How to solve it? This code is back-end of my another code and I want only single entry to pass as test to predict results. Also all are 'factors' in training as well as testing set. Help appreciated.

PS: Previous solutions to same error, didn't help me.

str(test_set)

'data.frame':   1 obs. of  5 variables:
 $ Emot   : Factor w/ 1 level "fear": 1
 $ Pact   : Factor w/ 1 level "Bicycling": 1
 $ Mact   : Factor w/ 1 level "browsing": 1
 $ Session: Factor w/ 1 level "morning": 1
 $ Song   : Factor w/ 1 level "Dusk Till Dawn.mp3": 1

str(training_set)

'data.frame':   1052 obs. of  5 variables:
 $ Emot   : Factor w/ 8 levels "anger","contempt",..: 4 7 6 6 4 3 4 6 4 6 ...
 $ Pact   : Factor w/ 5 levels "Bicycling","Driving",..: 1 2 2 2 4 3 1 1 3 4 ...
 $ Mact   : Factor w/ 6 levels "browsing","chatting",..: 1 6 1 4 5 1 5 6 6 6 ...
 $ Session: Factor w/ 4 levels "afternoon","evening",..: 3 4 3 2 1 3 1 1 2 1 ...
 $ Song   : Factor w/ 101 levels "Aaj Ibaadat.mp3",..: 29 83 47 72 29 75 77 8 30 53 ...
minigeek
  • 2,766
  • 1
  • 25
  • 35
  • 1
    Please show what `test_set` or `training_set` look like (e.g. `str()`). The error message is pretty clear, you probably misspelled something. – Roman Luštrik Apr 15 '18 at 07:23
  • @RomanLuštrik added in question – minigeek Apr 15 '18 at 07:28
  • Can you please use `str`? This doesn't tell us much about the variables (e.g. type). – Roman Luštrik Apr 15 '18 at 07:29
  • 1
    @RomanLuštrik added – minigeek Apr 15 '18 at 07:32
  • Hum, can't think of anything from the top of my head. If you provide a reproducible example I can try a few things, like specifying a model formula (`randomForest(Song ~ ., data = training_set)`) instead of passing on `x` and `y`. – Roman Luštrik Apr 15 '18 at 07:34
  • @RomanLuštrik i tried that too, doesn't work either! I am not able to understand what this post says either : https://stackoverflow.com/questions/24829674/r-random-forest-error-type-of-predictors-in-new-data-do-not-match?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa – minigeek Apr 15 '18 at 07:41
  • If the problem is only with a single-row dataset, then maybe someone somewhere has forgotten a `drop = FALSE` in a subsetting operation. What happens when you use `options(error=recover)`? –  Apr 15 '18 at 07:48
  • I tried drop and options recover gives 2 options predict() and predict.randomForest() for debug, doesn't help. I just tired leveling both sets using `levels(dataset$Emot) <- levels(training_set$Emot)`. It works but gives wrong and same output for every input:(. – minigeek Apr 15 '18 at 08:06
  • @RomanLuštrik finally solved with weird solution. Thanks for your time n help . – minigeek Apr 15 '18 at 08:27

1 Answers1

1

Ohk this worked successfully, weird solution. Equalized classes of training and test set. Following code binds the first row of training set to the test set and than delete it.

test_set <- rbind(training_set[1, ] , test_set)
test_set <- test_set[-1,]

done! it works for single input as well as single entry .csv file, without bringing error in model.

minigeek
  • 2,766
  • 1
  • 25
  • 35