Given sample data of proportions of successes plus sample sizes and independent variable(s), I am attempting logistic regression in R.
The following code does what I want and seems to give sensible results, but does not look like a sensible approach; in effect it doubles the size of the data set
datf <- data.frame(prop = c(0.125, 0, 0.667, 1, 0.9),
cases = c(8, 1, 3, 3, 10),
x = c(11, 12, 15, 16, 18))
datf2 <- rbind(datf,datf)
datf2$success <- rep(c(1, 0), each=nrow(datf))
datf2$cases <- round(datf2$cases*ifelse(datf2$success,datf2$prop,1-datf2$prop))
fit2 <- glm(success ~ x, weight=cases, data=datf2, family="binomial")
datf$proppredicted <- 1 / (1 + exp(-predict(fit2, datf)))
plot(datf$x, datf$proppredicted, type="l", col="red", ylim=c(0,1))
points(datf$x, datf$prop, cex=sqrt(datf$cases))
which looks reasonably sensible.
But I am not happy about the use of datf2
as a way of separating the successes and failures by duplicating the data. Is something like this necessary?
As a lesser question, is there a cleaner way of calculating the predicted proportions?