4

I am attempting to use the F1 score to determine which k value maximises the model for its given purpose. The model is made through the train function in the caret package.

Example dataset: https://www.kaggle.com/lachster/churndata

My current code includes the following (as the function for f1 score):

f1 <- function(data, lev = NULL, model = NULL) {
    precision <- posPredValue(data$pred, data$obs, positive = "pass")
    recall <- sensitivity(data$pred, data$obs, positive = "pass")
    f1_val <- (2*precision*recall) / (precision + recall)
    names(f1_val) <- c("F1")
    f1_val
}

The following as train control:

train.control <- trainControl(method = "repeatedcv", number = 10, repeats = 3, 
summaryFunction = f1, search = "grid")

And the following as my final execution of the train command:

x <- train(CHURN ~. , 
  data = experiment, 
  method = "knn", 
  tuneGrid = expand.grid(.k=1:30), 
  metric = "F1", 
  trControl = train.control)

Please note that the model is attempting to predict the churn rate from a set of telco customers.

The execution returns the following result: Something is wrong; all the F1 metric values are missing:

       F1     
 Min.   : NA  
 1st Qu.: NA  
 Median : NA  
 Mean   :NaN  
 3rd Qu.: NA  
 Max.   : NA  
 NA's   :30   
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.

EDIT: Thanks to help from missuse my code now looks like the following but returns this error

    levels(exp2$CHURN) <- make.names(levels(factor(exp2$CHURN)))
    
    library(mlbench)
    
    train.control <- trainControl(method = "repeatedcv", number = 10, repeats = 3, 
summaryFunction = prSummary, classProbs = TRUE)
    
    knn_fit <- train(CHURN ~., data = exp2, method = "knn", trControl = 
train.control, preProcess = c("center", "scale"), tuneLength = 15, metric = "F")

The error:

Error in trainControl(method = "repeatedcv", number = 10, repeats = 3,  : 
  object 'prSummary' not found
user438383
  • 5,716
  • 8
  • 28
  • 43
Nalhcal
  • 75
  • 8

1 Answers1

5

Caret contains a summary function: prSummary that provides the F1 score Full example:

library(caret)
library(mlbench)
data(Sonar)

train.control <- trainControl(method = "repeatedcv", number = 10, repeats = 3, 
                              summaryFunction = prSummary, classProbs = TRUE)


knn_fit <- train(Class ~., data = Sonar, method = "knn",
                 trControl=train.control ,
                 preProcess = c("center", "scale"),
                 tuneLength = 15,
                 metric = "F")
knn_fit
#output
k-Nearest Neighbors 

208 samples
 60 predictor
  2 classes: 'M', 'R' 

Pre-processing: centered (60), scaled (60) 
Resampling: Cross-Validated (10 fold, repeated 3 times) 
Summary of sample sizes: 187, 188, 187, 188, 187, 187, ... 
Resampling results across tuning parameters:

  k   AUC        Precision  Recall     F        
   5  0.3582687  0.7936713  0.9065657  0.8414592
   7  0.4985709  0.7758271  0.8883838  0.8239438
   9  0.6632328  0.7484092  0.8853535  0.8089210
  11  0.7426320  0.7151175  0.8676768  0.7814297
  13  0.7388742  0.6883105  0.8646465  0.7641392
  15  0.7594436  0.6787983  0.8467172  0.7520524
  17  0.7583071  0.6909693  0.8527778  0.7616448
  19  0.7702208  0.6913001  0.8585859  0.7644433
  21  0.7642698  0.6962528  0.8707071  0.7719442
  23  0.7652370  0.6945755  0.8707071  0.7696863
  25  0.7606508  0.6929364  0.8707071  0.7683987
  27  0.7454728  0.6916762  0.8676768  0.7669464
  29  0.7551679  0.6900416  0.8707071  0.7676640
  31  0.7603099  0.6935720  0.8828283  0.7749490
  33  0.7614621  0.6938805  0.8770202  0.7728923

F was used to select the optimal model using the largest value.
The final value used for the model was k = 5.
missuse
  • 19,056
  • 3
  • 25
  • 47
  • Awesome!! I now receive: `Error in trainControl(method = "repeatedcv", number = 10, repeats = 3, : object 'prSummary' not found` when running the train.control – Nalhcal Mar 24 '18 at 16:44
  • is there a way to get around this? I'm working on a school computer and unable to update R – Nalhcal Mar 24 '18 at 17:27
  • 1
    That function has been around for a while so you could just get the CRAN version. You didn't say what version that you are using – topepo Mar 24 '18 at 19:57