I have some data similar to:
data(Titanic) # need one row per passenger
df <- data.frame(Titanic, stringsAsFactors=TRUE)
df <- df[rep(seq_len(nrow(df)), df[,"Freq"]), which(names(df)!="Freq")]
I trained a model in caret
using repeated cross-validated logistic regression, like:
library(caret)
tc <- trainControl(method="repeatedcv", number=10, repeats=3,
returnData=TRUE, savePredictions=TRUE, classProbs=TRUE)
glmFit <- train(Survived ~ Class + Sex + Age, data = df, weights=Freq,
method="glm", family="binomial",
trControl = tc)
summary(glmFit)
I would like to obtain the average in-sample fitted probability and out-of-sample predicted probability (averages of 27 and of 3 values for each row in the data frame, respectively, in this case since it's 10-fold CV x 3 repeats).
I would like to append each row's average in-sample and out-of-sample probability estimates onto the data frame -- to look like the last two columns of:
>df_appended
| Class | Sex | Age | Survived | training_p_surv_est | testing_p_surv_est |
3rd M Child 0 .251 .259
3rd M Child 1 .251 .259
2nd M Child 1 .324 .319
2nd M Child 0 .324 .319
According to ?trainControl
, I have saved the holdout predictions for each resample with savePredictions=TRUE
. (And classProbs=TRUE
, since I want raw probabilities, not classes.)
How do I access the in-sample and out-of-sample predictions? Looking at ?predict.train
, I have tried using
extractProb(list(glmFit))
#Error in eval(expr, envir, enclos) : object 'Class2nd' not found
Many thanks.