In R caret, obtain in-sample and out-of sample probability estimates

Question

I have some data similar to:

data(Titanic) # need one row per passenger

df <- data.frame(Titanic, stringsAsFactors=TRUE) 

df <- df[rep(seq_len(nrow(df)), df[,"Freq"]), which(names(df)!="Freq")]

I trained a model in caret using repeated cross-validated logistic regression, like:

library(caret) 

tc <- trainControl(method="repeatedcv", number=10, repeats=3, 
                   returnData=TRUE, savePredictions=TRUE, classProbs=TRUE)

glmFit <- train(Survived ~ Class + Sex + Age, data = df, weights=Freq, 
                method="glm", family="binomial",
                trControl = tc)

summary(glmFit)

I would like to obtain the average in-sample fitted probability and out-of-sample predicted probability (averages of 27 and of 3 values for each row in the data frame, respectively, in this case since it's 10-fold CV x 3 repeats).

I would like to append each row's average in-sample and out-of-sample probability estimates onto the data frame -- to look like the last two columns of:

>df_appended
| Class  | Sex |  Age | Survived | training_p_surv_est | testing_p_surv_est |  
      3rd     M  Child          0                  .251                 .259
      3rd     M  Child          1                  .251                 .259
      2nd     M  Child          1                  .324                 .319
      2nd     M  Child          0                  .324                 .319

According to ?trainControl, I have saved the holdout predictions for each resample with savePredictions=TRUE. (And classProbs=TRUE, since I want raw probabilities, not classes.)

How do I access the in-sample and out-of-sample predictions? Looking at ?predict.train, I have tried using

extractProb(list(glmFit)) 
#Error in eval(expr, envir, enclos) : object 'Class2nd' not found

Many thanks.

score 0 · Answer 1 · answered Jun 03 '15 at 18:45

0

If you take a look at your glmFit object. It contains a sublist named 'pred'.

head(glmFit$pred)

You will get the predicted probability as well as predicted class for each cv and fold.

cheers.

answered Jun 03 '15 at 18:45

yuanhangliu1

157
1
1
7

1

I am not sure if this is what the question asked about. `glmFit$pred` would still give the out-of-sample performance, as every in every fold the held-out sample is not used while training, but only while predicting. – exAres Jun 21 '15 at 16:28

In R caret, obtain in-sample and out-of sample probability estimates

1 Answers1