Below is a set of fictitious probability data, which I converted into binomial with a threshold of 0.5
. I ran a glm()
model on the discrete data to test if the intervals returned from glm() were 'mean prediction intervals' ("Confidence Interval") or 'point prediction intervals'("Prediction Interval"). It appears from the plot below that the returned intervals are the latter--'Point Prediction Intervals'; note, with 95% confidence, 2/20 points fall outside of the line in this sample.
If this is indeed the case, how do I generate the the 'mean prediction interval' (i.e, "Confidence Intervals") in R for a binomial data set bound by 0 and 1 using glm()? Please show your code and plot similar to mine with the fit line, given probabilities, 'confidence intervals' and 'prediction intervals'.
# Fictitious data
xVal <- c(15,15,17,18,32,33,41,42,47,50,
53,55,62,63,64,65,66,68,70,79,
94,94,94,95,98)
randRatio <- c(.01,.03,.05,.04,.01,.2,.1,.08,.88,.2,
.2,.99,.49,.88,.2,.88,.66,.87,.66,.90,
.98,.88,.95,.95,.95)
# Converted to binomial
randBinom <- ifelse(randRatio < .5, 0, 1)
# Data frame for model
binomData <- data.frame(
randBinom = randBinom,
xVal = xVal
)
# Model
mode1 <- glm(randBinom~ xVal, data = binomData, family = binomial(link = "logit"))
# Predict all points in xVal range
frame <- data.frame(xVal=(0:100))
predAll <- predict(mode1, newdata = frame,type = "link", se.fit=TRUE)
# Params for intervals and plot
confidence <- .95
score <- qnorm((confidence / 2) + .5)
frame <- data.frame(xVal=(0:100))
#Plot
with(binomData, plot(xVal, randBinom, type="n", ylim=c(0, 1),
ylab = "Probability", xlab="xVal"))
lines(frame$xVal, plogis(predAll$fit), col = "red", lty = 1)
lines(frame$xVal, plogis(predAll$fit + score * predAll$se.fit), col = "red", lty = 3)
lines(frame$xVal, plogis(predAll$fit - score * predAll$se.fit), col = "red", lty = 3)
points(xVal, randRatio, col = "red") # Original probabilities
points(xVal, randBinom, col = "black", lwd = 3) # Binomial Points used in glm
Here's the plot, presumably with 'point prediction intervals' (i.e., "Prediction Intervals") in dashed red, and the mean fit in solid red. Black dots represent the discrete binomial data from original probabilities in randRatio
: