How to efficiently use calculation of objective function in gradient function for optim?

Question

I am using the example from here, where the original post had an objective function returning a list, with first element equal to the value of the objective function and the second element the gradient:

logisticRegressionCost <- function(theta, X, y) {
    J = 0;
    theta = as.matrix(theta);
    X = as.matrix(X);
    y = as.matrix(y);   

    rows = dim(theta)[2];
    cols = dim(theta)[1];
    grad = matrix(0, rows, cols);

    predicted = sigmoid(X %*% theta);
    J = (-y) * log(predicted) - (1 - y) * log(1 - predicted);

    J = sum(J) / dim(y)[1];

    grad = t(predicted - y);
    grad = grad %*% X;
    grad = grad / dim(y)[1];

    return(list(fn = J, gr = t(grad)));    
}

The suggested solution to use optim is to split this into two separate functions that serve as wrappers, e.g.:

fn <- function(...){
   logisticRegressionCost(...)$fn
}
gr <- function(...){
   logisticRegressionCost(...)$gr
}

and thus optim can be called like optim(fn = fn, gr = gr, ...).

However, this is unsatisfactory as computation of the gradient generally relies on shared computations with the objective function. In this case, the line:

predicted = sigmoid(X %*% theta);

will definitely be duplicated.

Is there a way to use optim so that shared computations between the objective function and gradient are efficient performed?

How to efficiently use calculation of objective function in gradient function for optim?

0 Answers0