Deriving prediction efficiency and prediction errors for Ensemble Machine Learning model stacks

Question

I am trying to derive prediction errors for ensemble models fitted using makeStackedLearner in the mlr package. These are the steps I am following:

> library(mlr)
> library(matrixStats)
> data(BostonHousing, package = "mlbench")
> tsk = makeRegrTask(data = BostonHousing, target = "medv")
> BostonHousing$chas = as.numeric(BostonHousing$chas)
> base = c("regr.rpart", "regr.svm", "regr.ranger")
> lrns = lapply(base, makeLearner)
> m = makeStackedLearner(base.learners = lrns,
+                        predict.type = "response", method = "stack.cv", super.learner = "regr.lm")
> tmp = train(m, tsk)
> summary(tmp$learner.model$super.model$learner.model)

Call:
stats::lm(formula = f, data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.8014  -1.5154  -0.2479   1.2160  23.6530 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.76991    0.43211  -6.410 3.35e-10 ***
regr.rpart  -0.09575    0.04858  -1.971   0.0493 *  
regr.svm     0.17379    0.07710   2.254   0.0246 *  
regr.ranger  1.04503    0.08904  11.736  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.129 on 502 degrees of freedom
Multiple R-squared:  0.885, Adjusted R-squared:  0.8843 
F-statistic:  1287 on 3 and 502 DF,  p-value: < 2.2e-16
> res = predict(tmp, tsk)

Note I use the method = "stack.cv" which means that any time the models get refitted using makeStackedLearner the numbers will be slightly different. My first question is:

Is the R-square derived from the super.learner model an objective measure of the predictive power? (I assume because it is based on the Cross-Validation with refitting it should be)

> ## cross-validation R-square
> round(1-tmp$learner.model$super.model$learner.model$deviance /
+   tmp$learner.model$super.model$learner.model$null.deviance, 3)
[1] 0.872

How to derive prediction error (prediction interval) for all newdata rows?

The method I use at the moment simply derives standard deviation of the multiple independent model predictions (which is the model error):

> res.all <- getStackedBaseLearnerPredictions(tmp)
> wt <- summary(tmp$learner.model$super.model$learner.model)$coefficients[-1,4]
> res.all$model.error <- matrixStats::rowSds(
+        as.matrix(as.data.frame(res.all))[,which(wt<0.05)], na.rm=TRUE)
> res$data[1,]
  id truth response
1  1    24 26.85235
> res.all$model.error[1]
[1] 2.24609

So in this case predicted value is 26.85, truth is 24, and the prediction error is estimated at 2.24. Again, because stack.cv method is used, everytime you refit the models you get slightly different values. Are you aware of any similar approach to derive prediction error for ensemble models? Thanks in advance.

Not sure if I understand your question correctly. You can use any of the performance measures that mlr supports with the stacked learner just like with any other learner. In principle, there is no difference in computing the error for ensemble models vs. "normal" models. Unless I'm missing something in your question? — Lars Kotthoff, Sep 16 '19 at 01:46
I am looking at deriving prediction error (also knows as the prediction interval) for each new prediction location. In a simple regression example you would use: `predict(eruption.lm, newdata, interval="predict")` (see: http://www.r-tutor.com/elementary-statistics/simple-linear-regression/prediction-interval-linear-regression). For stacked learners in R there is no such option (predict also upper/lower intervals, not only the mean value). — Tom Hengl, Sep 16 '19 at 08:01
You can set the `predict.type` to `se`. Is that what you're looking for? — Lars Kotthoff, Sep 16 '19 at 15:38
This does not work on StackedLearner: `> m.se = makeStackedLearner(base.learners = lrns, + predict.type="se", method = "stack.cv", super.learner = "regr.glm") Error in makeStackedLearner(base.learners = lrns, predict.type = "se", : Predicting standard errors currently not supported.` — Tom Hengl, Sep 16 '19 at 15:54
I think I kind of answered my own questions. Now just looking if there are better ideas (statistical/mathematical validity also very important obviously). — Tom Hengl, Sep 16 '19 at 15:56
Ok, but this is the kind of thing you're looking for? You can probably derive this from the predictions of the individual learners to some extent. But yes, it sounds like there's some theoretical work to be done here before implementation. — Lars Kotthoff, Sep 16 '19 at 15:58

Tom Hengl · Answer 1 · 2020-05-19T15:39:55.860

To derive prediction intervals (individual errors at new data) we can use the predict.lm function:

> m = makeStackedLearner(base.learners = lrns, predict.type = "response",
       method = "stack.cv", super.learner = "regr.lm")
> tmp = train(m, tsk)
> tmp$learner.model$super.model$learner.model

Call:
stats::lm(formula = f, data = d)

Coefficients:
 (Intercept)   regr.rpart     regr.svm  regr.ranger  
   -2.5879      -0.0971       0.3549       0.8635  

> res.all <- getStackedBaseLearnerPredictions(tmp)
> pred.error = predict(tmp$learner.model$super.model$learner.model, 
        newdata = res.all, interval = "prediction", level=2/3)
> str(pred.error)
 num [1:506, 1:3] 29.3 23.3 34.6 36 33.6 ...
 - attr(*, "dimnames")=List of 2
 ..$ : chr [1:506] "1" "2" "3" "4" ...
 ..$ : chr [1:3] "fit" "lwr" "upr"
> summary(tmp$learner.model$super.model$learner.model$residuals)
   Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-11.8037  -1.5931  -0.3161   0.0000   1.1951  29.2145 
> mean((pred.error[,3]-pred.error[,2])/2)
[1] 3.253142

This is an example with lm model as super learner. The level argument can be used to pass different probabilities (2/3 for 1 standard deviation). Predictions at newdata should be somewhat higher than what you obtain with training data (depending on extrapolation). This approach could also be extended to using e.g. Random Forest models (see ranger package) and quantile regression Random Forest derivation of prediction intervals (Hengl et al. 2019). Note that for this type of analysis, there should be at least two base learners (three recommended).

Deriving prediction efficiency and prediction errors for Ensemble Machine Learning model stacks

1 Answers1