Machine Learning: Stochastic gradient descent for logistic regression in R: Calculating Eout and average number of epochs

Question

I am trying to write a code to solve the following problem (As stated in HW5 in the CalTech course Learning from Data):

In this problem you will create your own target function f (probability in this case) and data set D to see how Logistic Regression works. For simplicity, we will take f to be a 0=1 probability so y is a deterministic function of x. Take d = 2 so you can visualize the problem, and let X = [-1; 1]×[-1; 1] with uniform probability of picking each x 2 X . Choose a line in the plane as the boundary between f(x) = 1 (where y has to be +1) and f(x) = 0 (where y has to be -1) by taking two random, uniformly distributed points from X and taking the line passing through them as the boundary between y = ±1. Pick N = 100 training points at random from X , and evaluate the outputs yn for each of these points xn. Run Logistic Regression with Stochastic Gradient Descent to find g, and estimate Eout(the cross entropy error) by generating a sufficiently large, separate set of points to evaluate the error. Repeat the experiment for 100 runs with different targets and take the average. Initialize the weight vector of Logistic Regression to all zeros in each run. Stop the algorithm when |w(t-1) - w(t)| < 0:01, where w(t) denotes the weight vector at the end of epoch t. An epoch is a full pass through the N data points (use a random permutation of 1; 2; · · · ; N to present the data points to the algorithm within each epoch, and use different permutations for different epochs). Use a learning rate of 0.01.

I am required to calculate the nearest value to Eout for N=100, and the average number of epochs for the required criterion.

I wrote and ran the code but I'm not getting the right answers (as stated in the solutions, these are Eout is near 0.1 and the number of epochs is near 350). The required number of epochs for a delta w of 0.01 comes to far too small (around 10), leaving the error too big (around 2). I then tried to replace the criterion with |w(t-1) - w(t)| < 0.001 (rather than 0.01). Then, the average required number of epochs was about 250 and out of sample error was about 0.35.

Is there something wrong with my code/solution, or is it possible that the answers provided are faulty? I've added comments to indicate what I intend to do at each step. Thanks in advance.

library(pracma)

h<- 0 # h will later be updated to number of required epochs

p<- 0 # p will later be updated to Eout

C <- matrix(ncol=10000, nrow=2) # Testing set, used to calculate out of sample error

d <- matrix(ncol=10000, nrow=1)

for(i in 1:10000){
  C[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
  d[1, i] <- sign(C[2, i] - f(C[1, i])) 
}

for(g in 1:100){ # 100 runs of the experiment

  x <- runif(2, min = -1, max = 1)

  y <- runif(2, min = -1, max = 1)

  fit = (lm(y~x))

  t <- summary(fit)$coefficients[,1] 

  f <- function(x){   # Target function
    t[2]*x + t[1]
  }

  A <- matrix(ncol=100, nrow=2) # Sample data

  b <- matrix(ncol=100, nrow=1)

  norm_vec <- function(x) {sqrt(sum(x^2))} # vector norm calculator

  w <- c(0,0) # weights initialized to zero

  for(i in 1:100){

    A[, i] <- c(runif(2, min = -1, max = 1)) # Sample data

    b[1, i] <- sign(A[2, i] - f(A[1, i])) 
  }

  q <- matrix(nrow = 2, ncol = 1000) # q tracks the weight vector at the end of each epoch

  l= 1

  while(l < 1001){

    E <- function(z){ # cross entropy error function

      x = z[1]

      y = z[2]

      v = z[3]

      return(log(1 + exp(-v*t(w)%*%c(x, y))))
    }

    err <- function(xn1, xn2, yn){ #gradient of error function

      return(c(-yn*xn1, -yn*xn2)*(exp(-yn*t(w)*c(xn1,xn2))/(1+exp(-yn*t(w)*c(xn1,xn2)))))
    }

    e = matrix(nrow = 2, ncol = 100) # e will track the required gradient at each data point

    e[,1:100] = 0 

    perm = sample(100, 100, replace = FALSE, prob = NULL) # Random permutation of the data indices

    for(j in 1:100){ # One complete Epoch

      r = A[,perm[j]] # pick the perm[j]th entry in A

      s = b[perm[j]]  # pick the perm[j]th entry in b

      e[,perm[j]] = err(r[1], r[2], s) # Gradient of the error

      w = w - 0.01*e[,perm[j]] # update the weight vector accorng to the formula involving step size, gradient
    }

    q[,l] = w # the lth entry is the weight vector at the end of the lth epoch

    if(l > 1 & norm_vec(q[,l] - q[,l-1])<0.001){ # given criterion to terminate the algorithm

      break
    }
    l = l+1 # move to the next epoch
  }

  for(n in 1:10000){

    p[g] = mean(E(c(C[1,n], C[2, n], d[n]))) # average over 10000 data points, of the error function, in experiment no. g
  }

  h[g] = l #gth entry in the vector h, tracks the number of epochs in the gth iteration of the experiment

}

mean(h) # Mean number of epochs needed 

mean(p) # average Eout, over 100 experiments

Machine Learning: Stochastic gradient descent for logistic regression in R: Calculating Eout and average number of epochs

0 Answers0