3

Hi my name is Abhi and I am using caret to build a gbm trees based model. However instead of accuracy I would like to use roc as my metric

Here is the code I have so far

myTuneGrid <- expand.grid(n.trees = 500,interaction.depth = 11,shrinkage = 0.1)
fitControl <- trainControl(method = "repeatedcv", number = 7,repeats = 1, verboseIter = FALSE,returnResamp = "all",classProbs = TRUE)
myModel <- train(Cover_Type ~ .,data = modelData,method = "gbm",trControl = fitControl,tuneGrid = myTuneGrid,metric='roc')

However when I run this code I get a warning

Warning message:
In train.default(x, y, weights = w, ...) :
The metric "roc" was not in the result set. Accuracy will be used instead.

How do I force my model to use roc instead of accuracy. What am I doing wrong here?

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Abhi
  • 399
  • 2
  • 7
  • 21
  • 2
    There are examples of using caret for gbm models on the [caret website](http://topepo.github.io/caret/training.html). I suspect, at first glance, that your warning message is a result of not specifying `twoClassSummary` as the summary function in `trainControl` and possibly not capitalizing 'roc' to 'ROC' – cdeterman Oct 10 '14 at 20:01
  • Changed my trainControl to trainControl(method = "repeatedcv", number = 7,metric = 'roc',summaryFunction=twoClassSummary,repeats = 1, verboseIter = FALSE,returnResamp = "all",classProbs = TRUE) but still no luck – Abhi Oct 10 '14 at 20:36
  • Can you confirm if you can run the following [gist](https://gist.github.com/cdeterman/d0e38a768b1a55d9b900) without the warning message you show? It is nothing more than the demo from the caret website with your additional grid and matching arguments. It also would be best to check if the 'pROC' package is installed. – cdeterman Oct 13 '14 at 12:33
  • My final variable had 7 classes instead of 2. When I replaced twoClassSummary with multiClassSummary the code worked fine. I got the code for multiClassSummary online – Abhi Oct 14 '14 at 15:42
  • 1
    glad you solved your problem. You can answer your own question then. Please provide a link for other interested users. – cdeterman Oct 14 '14 at 15:49

3 Answers3

1

Here is the link to the github project for the source code? https://github.com/rseiter/PracticalMLProject/blob/master/multiClassSummary.R

Abhi
  • 399
  • 2
  • 7
  • 21
0

It should work if you specify twoClassSummary() in trainControl and also use metric="ROC" (instead of method="roc" in your code):

df = iris
df$Species =factor(ifelse(df$Species=="versicolor","v","o"))

fitControl <- trainControl(method = "cv",returnResamp = "all",
classProbs = TRUE,summaryFunction = twoClassSummary)

myModel <- train(Species ~ .,data = df,method = "gbm",trControl = fitControl,metric='ROC')

Stochastic Gradient Boosting 

150 samples
  4 predictor
  2 classes: 'o', 'v' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... 
Resampling results across tuning parameters:

  interaction.depth  n.trees  ROC    Sens  Spec
  1                   50      0.988  0.98  0.92
  1                  100      0.980  0.97  0.94
  1                  150      0.972  0.96  0.94
  2                   50      0.984  0.97  0.94
  2                  100      0.976  0.96  0.92
  2                  150      0.960  0.97  0.92
  3                   50      0.984  0.97  0.94
  3                  100      0.968  0.98  0.92
  3                  150      0.968  0.96  0.92

Tuning parameter 'shrinkage' was held constant at a value of 0.1

Tuning parameter 'n.minobsinnode' was held constant at a value of 10
ROC was used to select the optimal model using the largest value.
The final values used for the model were n.trees = 50, interaction.depth =
 1, shrinkage = 0.1 and n.minobsinnode = 10.
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
0
    ctrl <- trainControl(method = "repeatedcv",   
                         number = 10, repeats = 2,                          
                         summaryFunction=twoClassSummary,   
                         classProbs=TRUE,
                         allowParallel = TRUE)
    gbm <- train(income~age+education_num+sex+hours_per_week, data = newdata,
                      method = "gbm",
                      metric = "ROC",
                      trControl = ctrl,
                      verbose=FALSE)


Stochastic Gradient Boosting 

1000 samples
   4 predictor
   2 classes: 'small', 'large' 

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 2 times) 
Summary of sample sizes: 900, 900, 900, 901, 899, 900, ... 
Resampling results across tuning parameters:

  interaction.depth  n.trees  ROC        Sens       Spec     
  1                   50      0.8237040  0.9535458  0.3064312
  1                  100      0.8225003  0.9338944  0.3637681
  1                  150      0.8209603  0.9319378  0.3725543
  2                   50      0.8268678  0.9280075  0.3874094
  2                  100      0.8258134  0.9214457  0.4150362
  2                  150      0.8232040  0.9168831  0.4317029
  3                   50      0.8236631  0.9195062  0.4252717
  3                  100      0.8218651  0.9116285  0.4297101
  3                  150      0.8168575  0.9063910  0.4341486

Tuning parameter 'shrinkage' was held constant at a value of 0.1

Tuning parameter 'n.minobsinnode' was held constant at a value of 10
ROC was used to select the optimal model using the largest value.
The final values used for the model were n.trees = 50, interaction.depth =
 2, shrinkage = 0.1 and n.minobsinnode = 10.
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 23 '22 at 12:08