In Introduction to Statistical Learning we're asked to do the Leave Out One Cross Validation over logistic regression manually. The code for it is here:
count = rep(0, dim(Weekly)[1])
for (i in 1:(dim(Weekly)[1])) {
##fitting a logistic regression model, not including ith data in the training data
glm.fit = glm(Direction ~ Lag1 + Lag2, data = Weekly[-i, ], family = binomial)
is_up = predict.glm(glm.fit, Weekly[i, ], type = "response") > 0.5
is_true_up = Weekly[i, ]$Direction == "Up"
if (is_up != is_true_up)
count[i] = 1
}
sum(count)
##[1] 490
The source of this code can be found here.
Which means that the error rate is approximately 45 %.
But when we do it, using the cv.glm()
function of the boot
library, the result is far different.
> library(boot)
> glm.fit = glm(Direction~Lag1+Lag2,data=Weekly,family=binomial)
> cv.glm = cv.glm(Weekly,glm.fit)
> cv.glm$delta
[1] 0.2464536 0.2464530
Why does this occur? What does the cv.glm()
function exactly do?