I have been trying to program a gradient descent algorithm in R for logistic regression in order to understand it better. In Andrew NG's ML course they seem to skip this part and instead show the advanced optimization approach. However, I would like to recreate the gradient descent approach myself. Here is my attempt:
###my data
X <- c(34.62366, 30.28671, 35.84741, 60.18260, 79.03274)
X <- cbind(rep(1,5),X)
y <- c(0, 0, 0, 1, 1)
###sigmoid function to calculate predicted probabilities
sigmoid <- function(z) {
#SIGMOID Compute sigmoid function
z <- as.matrix(z)
g <- matrix(0,dim(z)[1],dim(z)[2])
g <- 1 / (1 + exp(-1 * z))
g
}
###Gradient Descent
theta <- c(0,0)
iterations <- 15000
alpha <- 0.02
m <- length(y)
for (i in 1:iterations) {
theta_prev = theta
p = dim(X)[2]
for (j in 1:p) {
h <- sigmoid(X %*% theta_prev)
#sigmoid derivative
deriv <- (t(h - y) %*% X[,j]) / m
theta[j] = theta_prev[j] - (alpha * deriv)
}
}
This gives me the final coefficients of -11.95 and 0.24, whereas using the GLM
function in R
I get -90.87 and 1.89. Does anyone know where my code goes wrong?
Here's the code for the GLM
model:
X <- X[,2]
mod <- glm(y ~ X, family = 'binomial')
coef(mod)
Thanks in advance!
EDIT: With this larger dataset which doesn't have perfect separation, the discrepancy between coefficients remains. Also, the discrepancy remains with even larger datasets of 100 observations.
X <- c(34.62366, 30.28671, 35.84741, 60.18260, 79.03274, 45.08328, 61.10666,
75.02475, 76.09879, 84.43282, 95.86156, 75.01366, 82.30705, 69.36459, 39.53834)
X <- cbind(rep(1,5),X)
y <- c(0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0)
With this slightly larger dataset my attempt returns coefficients of -18,46 and 0.15 whilst R's GLM
returns -4.12 and 0.07.