3

I'm trying to run a model with the mlr package but I'm having some problems with the predict() function. It gives me the following error message:

Error in predict(mod, task = task, subset = test) : 
Assertion on 'subset' failed: Must be of type 'integerish', not 'data.frame'

Please find a reproducible example below:

require(mlr)     # models
require(caTools) # sampling
require(Zelig)   # data

data("voteincome")
voteincome$vote <- as.factor(voteincome$vote)

set.seed(0)
sample <- sample.split(voteincome, SplitRatio = .75)
train <- subset(voteincome, sample == TRUE)
test <- subset(voteincome, sample == FALSE)

train <- na.omit(train)
test <- na.omit(test)

task <- makeClassifTask(data = train, target = "vote")
lrnr <- makeLearner("classif.randomForest")
mod <- train(lrnr, task)
pred <- predict(mod, task = task, subset = test)

And then the error appears. Am I doing something wrong? Thanks!

danilofreire
  • 503
  • 1
  • 5
  • 18

2 Answers2

2

mlr expects an index vector to be passed to the subset argument. It will then subset the data frames automatically, so you don't have to do this yourself. You can also use mlr to do the partitioning into train and test sets automatically with a resample description (see the tutorial):

require(mlr)     # models
require(caTools) # sampling
require(Zelig)   # data

data("voteincome")
voteincome$vote <- as.factor(voteincome$vote)

set.seed(0)
task <- makeClassifTask(data = voteincome, target = "vote")
lrnr <- makeLearner("classif.randomForest")
rdesc <- makeResampleDesc("Holdout", split = 0.75)

res <- resample(learner = lrnr, task = task, resampling = rdesc)

# get predictions on test set
getPredictionResponse(res$pred)

# compute accuracy, also see https://mlr-org.github.io/mlr-tutorial/devel/html/performance/index.html
performance(res$pred, acc)
Lars Kotthoff
  • 107,425
  • 16
  • 204
  • 204
1

Try this:

pred <- predict(mod$learner.model, task = task, subset = test) 
Vedda
  • 7,066
  • 6
  • 42
  • 77