I am trying to replicate Caruana et al.'s method for Ensemble selection from libraries of models (pdf). At the core of the method is a greedy algorithm for adding models to the ensemble (models can be added more than once). I've written an implementation for this greedy optimization algorithm, but it is very slow:
library(compiler)
set.seed(42)
X <- matrix(runif(100000*10), ncol=10)
Y <- rnorm(100000)
greedOpt <- cmpfun(function(X, Y, iter=100){
weights <- rep(0, ncol(X))
while(sum(weights) < iter) {
errors <- sapply(1:ncol(X), function(y){
newweights <- weights
newweights[y] <- newweights[y] + 1
pred <- X %*% (newweights)/sum(newweights)
error <- Y - pred
sqrt(mean(error^2))
})
update <- which.min(errors)
weights[update] <- weights[update]+1
}
return(weights/sum(weights))
})
system.time(a <- greedOpt(X,Y))
I know R doesn't do loops well, but I can't think of any way to do this type of stepwise search without a loop.
Any suggestions for improving this function?