1

I am doing a stack of models in R as follows:

ctrl <- trainControl(method="repeatedcv", number=5, repeats=3, returnResamp="final", savePredictions="final", classProbs=TRUE, selectionFunction="oneSE", verboseIter=TRUE)

models_stack <- caretStack(
  model_list,
  data=train_data,
  tuneLength=10,
  method="glmnet",
  metric="ROC",
  trControl=ctrl
)

1) Why am I seeing the following error? What can I do? I am stuck now.

Timing stopped at: 0.89 0.005 0.91
 Show Traceback
Error in (function (x, y, family = c("gaussian", "binomial", "poisson", : unused argument (data = list(c(-0.00891097103286995, 0.455282701499392, 0.278236211515583, 0.532932725880776, 0.511036607368827, 0.688757947257125, -0.560727863490874, -0.21768155316146, 0.642219917023467, 0.220363129901216, 0.591732278371339, 1.02850020403572, -1.02417799431585, 0.806359545011601, -1.21490317454699, -0.671361009441299, 0.927344615788642, -0.10449847318776, 0.595493217624868, -1.05586363903119, -0.138457794869817, -1.026253562838, -1.38264471633224, -1.32900800143341, 0.0383617314263342, -0.82222313323842, -0.644251885665736, -0.174126438952992, 0.323934240274895, -0.124613523895458, 0.299359713721601, -0.723599218327519, -0.156528054435544, -0.76193093842169, 0.863217455799044, -1.01340448660914, -0.314365383747751, 1.19150804114605, 0.314703439577839, 1.55580594654149, -0.582911462615421, -0.515291378382375, 0.305142268138296, 0.513989405541095, -1.85093305614114, 0.436468060668601, -2.18997828727424, 1.12838871469007, -1.17619542016998, -0.218175589380355

2) Is there not supposed to have a "data" parameter? If i need to use a different dataset for my level 1 supervisor model what I can do?

3) Also I wanted to use AUC/ROC but got these errors

The metric "AUC" was not in the result set. Accuracy will be used instead.

and

The metric "ROC" was not in the result set. Accuracy will be used instead.

I saw some online examples that ROC can be used, is it because it is not for this model? What metrics can I use besides Accuracy for this model? If I need to use ROC, what are the other options.

As requested by @RLave, this is how my model_list is done

grid.xgboost <- expand.grid(.nrounds=c(40,50,60),.eta=c(0.2,0.3,0.4),                
.gamma=c(0,1),.max_depth=c(2,3,4),.colsample_bytree=c(0.8),                
.subsample=c(1),.min_child_weight=c(1))

grid.rf <- expand.grid(.mtry=3:6)

model_list <- caretList(y ~.,
                    data=train_data_0,
                    trControl=ctrl,
                    tuneList=list(
                      xgbTree=caretModelSpec(method="xgbTree", tuneGrid=grid.xgboost),
                      rf=caretModelSpec(method="rf", tuneGrid=grid.rf)
                    )
  )

My train_data_0 and train_data are both from the same dataset. My dataset predicators are all numeric values with the label as a binary label

halfer
  • 19,824
  • 17
  • 99
  • 186
yeeen
  • 4,911
  • 11
  • 52
  • 73

1 Answers1

1

your question contains three questions:

  1. Why am I seeing the following error? What can I do? I am stuck now.

caretStack should not have a data parameter, the data is generated based on predictions of models in caretList. Take a look at this reproducible example:

library(caret)
library(caretEnsemble)
library(mlbench)

using the Sonar data set:

data(Sonar)

create grid for hyper parameter tune for xgboost:

grid.xgboost <- expand.grid(.nrounds = c(40, 50, 60),
                            .eta = c(0.2, 0.3, 0.4),                
                            .gamma = c(0, 1),
                            .max_depth = c(2, 3, 4),
                            .colsample_bytree = c(0.8),                
                            .subsample = c(1), 
                            .min_child_weight = c(1))

create grid for rf tune:

grid.rf <- expand.grid(.mtry = 3:6)

create train control:

ctrl <- trainControl(method="cv",
                     number=5,
                     returnResamp = "final",
                     savePredictions = "final",
                     classProbs = TRUE,
                     selectionFunction = "oneSE",
                     verboseIter = TRUE,
                     summaryFunction = twoClassSummary)

tune the models:

model_list <- caretList(Class ~.,
                        data = Sonar,
                        trControl = ctrl,
                        tuneList = list(
                          xgbTree = caretModelSpec(method="xgbTree",
                                                   tuneGrid = grid.xgboost),
                          rf = caretModelSpec(method = "rf",
                                              tuneGrid = grid.rf))
)

create the stacked ensamble:

models_stack <- caretStack(
  model_list,
  tuneLength = 10,
  method ="glmnet",
  metric = "ROC",
  trControl = ctrl
)

2) Is there not supposed to have a "data" parameter? If i need to use a different dataset for my level 1 supervisor model what I can do?

caretStack needs only the predictions from the base models, in order to create an ensemble of models trained on different data you must create a new caretList with the appropriate data specified there.

3) Also I wanted to use AUC/ROC but got these errors

The easiest way to use AUC as metric is to set: summaryFunction = twoClassSummary in trainControl

missuse
  • 19,056
  • 3
  • 25
  • 47