17

I was told to use the caret package in order to perform Support Vector Machine regression with 10 fold cross validation on a data set I have. I'm plotting my response variable against 151 variables. I did the following:-

> ctrl <- trainControl(method = "repeatedcv", repeats = 10)
> set.seed(1500)
> mod <- train(RT..seconds.~., data=cadets, method = "svmLinear", trControl = ctrl)

in which I got

C    RMSE  Rsquared  RMSE SD  Rsquared SD
  0.2  50    0.8       20       0.1        
  0.5  60    0.7       20       0.2        
  1    60    0.7       20       0.2   

But I want to be able to have a look at my folds, and for each of them how close the predicted values were to the actual values. How do I go about looking at this?

Also, it says that:-

RMSE was used to select the optimal model using  the smallest value.
The final value used for the model was C = 0.

I was just wondering what this meant and what the C stands for in the table above?

RT (seconds)    76_TI2  114_DECC    120_Lop 212_PCD 236_X3Av
38  4.086   1.2 2.322   0   0.195
40  2.732   0.815   1.837   1.113   0.13
41  4.049   1.153   2.117   2.354   0.094
41  4.049   1.153   2.117   3.838   0.117
42  4.56    1.224   2.128   2.38    0.246
42  2.96    0.909   1.686   0.972   0.138
42  3.237   0.96    1.922   1.202   0.143
44  2.989   0.8 1.761   2.034   0.11
44  1.993   0.5 1.5 0   0.102
44  2.957   0.8 1.761   0.988   0.141
44  2.597   0.889   1.888   1.916   0.114
44  2.428   0.691   1.436   1.848   0.089

This is a snipet of my dataset. I'm trying to pot RT seconds against 151 variables.

Thanks

Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
user2062207
  • 955
  • 4
  • 18
  • 34

1 Answers1

22

You have to save your CV predictions via the "savePred" option in your trainControl object. I'm not sure what package your "cadets" data is from, but here is a trivial example using iris:

> library(caret)
> ctrl <- trainControl(method = "cv", savePred=T, classProb=T)
> mod <- train(Species~., data=iris, method = "svmLinear", trControl = ctrl)
> head(mod$pred)
        pred        obs      setosa  versicolor   virginica rowIndex   .C Resample
1     setosa     setosa 0.982533940 0.009013592 0.008452468       11 0.25   Fold01
2     setosa     setosa 0.955755054 0.032289120 0.011955826       35 0.25   Fold01
3     setosa     setosa 0.941292675 0.044903583 0.013803742       46 0.25   Fold01
4     setosa     setosa 0.983559919 0.008310323 0.008129757       49 0.25   Fold01
5     setosa     setosa 0.972285699 0.018109218 0.009605083       50 0.25   Fold01
6 versicolor versicolor 0.007223973 0.971168170 0.021607858       59 0.25   Fold01

EDIT: The "C" is one of tuning parameters for your SVM. Check out the help for the ksvm function in the kernlab package for more details.

EDIT2: Trivial regression example

> library(caret)
> ctrl <- trainControl(method = "cv", savePred=T)
> mod <- train(Sepal.Length~., data=iris, method = "svmLinear", trControl = ctrl)
> head(mod$pred)
      pred obs rowIndex   .C Resample
1 4.756119 4.8       13 0.25   Fold01
2 4.910948 4.8       31 0.25   Fold01
3 5.094275 4.9       38 0.25   Fold01
4 4.728503 4.8       46 0.25   Fold01
5 5.192965 5.3       49 0.25   Fold01
6 5.969479 5.9       62 0.25   Fold01
David
  • 9,284
  • 3
  • 41
  • 40
  • Hi thanks for the reply. I've changed the bit in the trainControl to just that, and I've included part of the dataset I'm looking at (the cadets dataset). I don't know how to modify the head(mod$pred) bit so that I can look at the RT (seconds) that were predicted from the model I've just created as I'm modelling RT seconds against 151 descriptor variables. How would I do it in this case? I hope this makes sense – user2062207 Dec 09 '13 at 01:40
  • You shouldn't have to modify the `mod$pred` part. Your "mod" object is your caret model which is a list that contains an element named "pred" that contains your CV predictions. – David Dec 09 '13 at 02:57
  • I keep getting in return NULL everytime I try to do that that however. Isn't mod$pred used for classification? I'm trying to do regression which may explain why this is happening – user2062207 Dec 09 '13 at 03:23
  • No, if you're getting NULL then you did not include `savePred=T` in your `trainControl` object. I added an edit that demonstrates this with a regression example. – David Dec 09 '13 at 04:02