1

I am in the process of creating a Radial SVM Classification model and I would to perform 5-fold CV on it and tune it. I have seen how others do it here and followed these instructions. However, my code does not want to implement my tuning grid. Also, I do not understand why I cannot get Accuracy or an F1 value when I train the model explicitly.

With 5-fold CV

library(caret)
set.seed(500)
ctrl <- trainControl(method = "repeatedcv",
                      number = 5,
                      repeats = 3, 
                      classProb=T,
                      summaryFunction = twoClassSummary
                     )
sigma<-c(2^-15,2^-13,2^-11,2^-9,2^-7,2^-5,2^-3,2^-1,2^1,2^2,2^3)
C<-c(2^-5,2^-3,2^-1,2^1,2^2,2^3,2^5,2^7,2^9,2^11,2^13)
tuninggrid<-data.frame(expand.grid(sigma,C))

mod <- train(x = iris[-5], y=iris$Species,
             method = "svmRadial", 
             trControl = ctrl,
             metric=c('ROC'),
             tunegrid=tuninggrid

The results are simply sigma was held constant. Why does it not use my tuning grid?

Secondly, when I adjust the metric from 'ROC' to 'Accuracy', it says Accuracy is not available. This I understand is because of my summaryFunction in trainControl. If I remove it, then I can get Accuracy, but not ROC. Ultimately, I would like both and an F1 value, but I cannot find documentation on this. How would I write something to give me both at the same time?

Lastly, the output from train(). To get the weights, it is just using mod$finalModel@coef correct?

Jack Armstrong
  • 1,182
  • 4
  • 26
  • 59
  • At first blush have you looked at str(mod) and summary(mod) ? – meh May 10 '19 at 15:31
  • Also, my recollection of how this works is that you have requested that the CV be done 5 times, not that you have created 5-fold CV. – meh May 10 '19 at 15:32
  • Ohh. So how would you fix that then? – Jack Armstrong May 10 '19 at 15:34
  • 1
    If you don't get an answer I'll try and give more details later. However, if you google the relevant terms you will find a lot of caret documentation exists online. Honestly that is, for your own edification, the best procedure. – meh May 10 '19 at 15:48
  • As much as I have reviewed it, I am still troubling to understand it. – Jack Armstrong May 10 '19 at 15:49
  • 1
    I suggest you read [this](http://topepo.github.io/caret/model-training-and-tuning.html#model-training-and-parameter-tuning) and all your questions should be answered. If still in doubt post an update to the question. – missuse May 12 '19 at 12:26
  • I updated the question after doing the reading on it to make it more clear and specific. Not all of my questions are answered but heading in the right direction. – Jack Armstrong May 19 '19 at 09:39

1 Answers1

2

There are a few small errors in your code:

  1. If you want to use area under the ROC as metric, you need to specify twoClassSummary as you did, but your response variable should also be binary. For example:
    train(..., y = factor(ifelse(iris$Species=="setosa", "setosa", "other")), ...)
    
  2. If you want to use accuracy as metric, use defaultSummary instead of twoClassSummary

  3. If you View(tuninggrid) you will see that its column names are Var1 and Var2, whereas they should be C and sigma. You can fix its definition:

    tuninggrid <- expand.grid(sigma=sigma,C=C)
    
  4. There is a typo in the call to train(...): correct argument name is tuneGrid (R is case sensitive)

Fixing these will solve your problem: View(mod$results)

EDIT: if you want to optimize the accuracy (computed in defaultSummary) but also display the AUROC (from twoClassSummary) and/or F measure (from prSummary), you can define your own metric function which combines all and use it in trainControl:

combinedSummary <- function(data, lev = NULL, model = NULL) {
  c(
    defaultSummary(data, lev, model),
    twoClassSummary(data, lev, model), 
    prSummary(data, lev, model)
    )
}
Pierre Gramme
  • 1,209
  • 7
  • 23
  • How would you get Accuracy and ROC and F1 all at once though? Or can you only do one or the other? – Jack Armstrong May 21 '19 at 15:07
  • The objective function that you optimize should be one-dimensional. Otherwise, how would you decide which option is best between (Acc=0.81, AUC=0.95, F1=0.87) and (Acc=0.95, AUC=0.87, F1=0.81) ? So the best is to choose one or the other, and if really necessary you can still build your own performance metric. – Pierre Gramme May 22 '19 at 17:04
  • Okay. That makes sense. So if I wanted to maximize accuracy I would not define the twoClassSummary part. But then I am assuming there must be ways to get the AUC and F1? – Jack Armstrong May 23 '19 at 15:31
  • Now I understand your comment better... I've edited the answer – Pierre Gramme May 23 '19 at 16:02
  • To follow up, the metric that I select in train is 'AUC', not 'ROC' and within `View(mod$results)`, I assume I am using the AUC value, not the ROC value that it produces? – Jack Armstrong Jun 10 '19 at 18:27
  • Indeed, `metric='ROC'` computes and optimises the AUC (which is the area under the ROC curve) – Pierre Gramme Jun 11 '19 at 06:56
  • I tested both, as in with `metric='ROC'` and then `'AUC'` and got the same results after I posted this, just wanted to confirm. – Jack Armstrong Jun 11 '19 at 15:52
  • Nice, thanks: I didn't know you could use value 'AUC' as alias for 'ROC' – Pierre Gramme Jun 12 '19 at 07:37