0

I want to prepare a data set to use it in a Task of the mlr package. Should binary factor independent variables be of class factor, logical, character, or integer? Is it OK to have factor variables with more than 2 classes as factor/character or are there models integrated in mlr which require e.g. a model matrix where mlr doesn't automatically do the conversion? Which classes does mlr expect for those cases?

For example:

x1 <- factor(sample(0:1, size=10, replace = TRUE))
x2 <- factor(sample(letters[1:5], size=10, replace = TRUE))
y <- sample(c("yes", "no"), size=10, replace = TRUE)
library(mlr)
makeClassifTask(data = data.frame(y, x1, x2), target = "y", positive="yes")
tover
  • 535
  • 4
  • 11

1 Answers1

2

Yes. If it's a factor, it should be a factor. You can of course have more than two classes, although not all learners support more than two classes (mlr will take care of determining whether a learner is compatible automatically). mlr always automatically converts everything in a task to be suitable for the learner, or tells you that the learner and task aren't compatible.

You can also list the learners suitable for a given task with the function listLearners().

Lars Kotthoff
  • 107,425
  • 16
  • 204
  • 204