Programing Logistic regression with Stochastic gradient descent in R

Question

I’m trying to program the logistic regression with stochastic descending gradient in R. For example I have followed the example of Andrew Ng named: “ex2data1.txt”.

The point is that the algorithm works properly, but thetas estimation is not exactly what I expected. So I tried to change whole algorithm in order to solve this issue. However, it was almost impossible to me. I was not able to detect the error which is causing this problem. Thus, It would be very useful if someone could check the example and tell me why thetas are not being calculated correctly. I really appreciate it.

Regarding the programming, I'm not using any function implemented in R or matrix calculation. I’m just using sums and subtractions in loops due to I want to use the code in hadoop and I cannot use matrix calculus or even functions which are already programed in R such as "sum", "sqrt", etc

Stochastic Gradient Descent is:

Loop {
   for i = 1 to m, {
     θj := θj + α(y(i) - hθ(x(i)))(xj)(i)
  }
}`

And Logistic regression: link to image

My code is:

data1 <- read.table("~/ex2data1.txt", sep = ",")
names(data1) <- c("Exam1", "Exam2", "Admit")

# Sample the data for stochastic gradient decent 

ss<-data1[sample(nrow(data1),size=nrow(data1),replace=FALSE),]

x <- with(ss, matrix(cbind(1, Exam1), nrow = nrow(ss)))
y <- c(ss$Admit)
m <- nrow(x)

# startup parameters

iterations<-1
j<-vector()
alpha<-0.05
theta<-c(0,0)



#My loop

while(iterations<=10){

    coste<-c(0,0)
    suma<-0

    for(i in 1:m){

        # h<-1/(1+exp(-Q*x)

        h<-1/(1+exp((-theta)*x[i,]))

        #Cost(hQ(x),y)=y(i)*log(hQ(x))+(1-y(i))*log(1-hQ(x))

            cost<-((y[i]*log(h))+((1-y[i])*log(1-h)))

        #sum(cost) i=1 to m

            suma<-suma+cost

        #Diferences=(hQ(x(i))-y(i))*x(i)

            difference<-(h-y[i])*x[i,]  

        #sum the differences 

            coste<-coste+difference

        #calculation thetas and upgrade = Qj:= Qj - alpha* sum((h-y[i])*x[i,]*x(i))

            theta[1]<-(theta[1]-alpha*1/m*(coste[1]))
            theta[2]<-(theta[2]-alpha*1/m*(coste[2]))

    }
        #J(Q)=(-1/m)* sum ( y(i)*log(hQ(x))+(1-y(i))*log(1-hQ(x)))

            j[iterations]<-(-1/m)*suma

            iterations=iterations+1

}



#If I compare my thetas with R glm 


Call:  glm(formula = y ~ x[, 2], family = binomial("logit"), data = data1)

Coefficients:

Intercept:-4.71816 

x[, 2]  :0.08091

My thetas

Intercept: 0.4624024 
 x[,2]: 1.3650234

gd047 · Answer 1 · 2014-04-02T10:44:21.787

6

I have implemented a solution in R for the other Ng's example set: ex2data2.txt. Here is my code:

sigmoid <- function(z) {
return(1/(1 + exp(-z)))
}


mapFeature <- function(X1, X2) {
degree <- 6
out <- rep(1, length(X1))
for (i in 1:degree) {
for (j in 0:i) {
out <- cbind(out, (X1^(i - j)) * (X2^j))
}
}
return(out)
}


## Cost Function
fr <- function(theta, X, y, lambda) {
m <- length(y)
return(1/m * sum(-y * log(sigmoid(X %*% theta)) - (1 - y) *
log(1 - sigmoid(X %*% theta))) + lambda/2/m * sum(theta[-1]^2))
}


## Gradient
grr <- function(theta, X, y, lambda) {
return(1/m * t(X) %*% (sigmoid(X %*% theta) - y) + lambda/m *
c(0, theta[-1]))
}

data <- read.csv("ex2data2.txt", header = F)
X = as.matrix(data[,c(1,2)])
y = data[,3]
X = mapFeature(X[,1],X[,2])
m <- nrow(X)
n <- ncol(X)
initial_theta = rep(0, n)
lambda <- 1
res <- optim(initial_theta, fr, grr, X, y, lambda,
method = "BFGS", control = list(maxit = 100000))

edited Apr 02 '14 at 10:44

answered Apr 02 '14 at 09:53

gd047

29,749
18
107
146

Hi! thanks for your answer,but Im just using sums and subtractions in loops due to I want to use the code in hadoop and I cannot use matrix calculus or even functions which are already programed in R such as "sum", "sqrt","optim", etc. – user3488416 Apr 03 '14 at 11:20
What is the purpose of the mapFeature function? – Lecromine Jul 24 '18 at 13:16
One way to fit the data better is to create more features from each data point. In the provided function mapFeature.m, we will map the features into all polynomial terms of x1 and x2 up to the sixth power. As a result of this mapping, our vector of two features (the scores on two QA tests) has been transformed into a 28-dimensional vector. A logistic regression classifier trained on this higher-dimension feature vector will have a more complex decision boundary and will appear nonlinear when drawn in our 2-dimensional plot. – gd047 Jul 24 '18 at 16:07

score 0 · Answer 2 · edited Aug 18 '14 at 23:47

0

Shouldn't * be %*% in a few instances? E.g. h<-1/(1+exp((-theta) %*% x[i,])) instead of h<-1/(1+exp((-theta)*x[i,]))

edited Aug 18 '14 at 23:47

npdoty

4,729
26
25

answered Aug 18 '14 at 23:27

Kai

1

Programing Logistic regression with Stochastic gradient descent in R

2 Answers2