4

I am looking to test outcome of different regression/classification algorithms (i.e. svm, nnet, rpart, randomForest, naiveBayes, etc.) on the same data, to see which works better. But I need to have my code as short and clean as possible. To test all algorithms, I want to run them using a single mclapply() call of package multicore:

invisible(lapply(c("party","nnet","caret","klaR","randomForest","e1071","rpart",
                   "multicore"), require, character.only = T))
algorithms <- c(knn3, NaiveBayes, nnet, ctree, randomForest, svm, naiveBayes, rpart)
data(iris)
model <- mclapply(algorithms, function(alg) alg(Species ~ ., iris))

The problem is that some of the algorithms need extra parameters, i.e. nnet() needs parameter size to be set. For sure this can be fixed through several if,else commands, but is there any simpler solution?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Ali
  • 9,440
  • 12
  • 62
  • 92

2 Answers2

5

One thing you could do is replace those in algorithms that require additional arguments with partial functions, e.g.

algorithms <- c(knn3, ctree, function(...) nnet(..., size=2))
Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
  • Equivalently, with the `functional` package, you can use `Curry(nnet, size=2)` as the function. – Brian Diggs Mar 28 '13 at 21:12
  • It's a great solution. I am thinking if we can easily assign names of algorithms to `names()` of `mclapply()` result, without extra variable definition: i.e. `alg.names <- c("knn3",...` – Ali Mar 28 '13 at 21:28
  • @BrianDiggs aha, `Curry`! I knew this was out there somewhere. – Matthew Plourde Mar 28 '13 at 23:20
  • 1
    @Ali atomic vectors accept names, so you could do `c(knn3=knn3, nnet=function(...) nnet(..., 2), etc.)`. I can't recall if `mclapply` will preserve those names, but you could use `names(model) <- names(algorithms)`, in any case. – Matthew Plourde Mar 28 '13 at 23:22
1

Package multicore does not seem to be available for Windows, but here's one way, with a simple example with ordinary lapply:

# names of the functions as strings
algorithms <- c("lm", "glm")
# arguments for each function (empty list for those which do not need any)
arguments <- list(lm=list(model=FALSE),glm=list(family = gaussian),lm=list())

# Use lapply with do.call
output<-lapply(1:length(algorithms), function(i) do.call(what=algorithms[i],
                    args=c(list(formula=y ~ .,data=freeny),arguments[[i]])))
names(output)<-algorithms #Add names to output

Now list output contains the outputs from each algorithm. Note that at first look the outputs (by command output$lm) of this example look bit awful as the printing of lm and glm shows the function call along with summary, and the function call is quite long here.

edit: Some small tweaking.

Jouni Helske
  • 6,427
  • 29
  • 52