I'm trying to implement a simple regularized logistic regression algorithm in Julia. I'd like to use Optim.jl library to minimize my cost function, but I can't get it to work.
My cost function and gradient are as follows:
function cost(X, y, theta, lambda)
m = length(y)
h = sigmoid(X * theta)
reg = (lambda / (2*m)) * sum(theta[2:end].^2)
J = (1/m) * sum( (-y).*log(h) - (1-y).*log(1-h) ) + reg
return J
end
function grad(X, y, theta, lambda, gradient)
m = length(y)
h = sigmoid(X * theta)
# gradient = zeros(size(theta))
gradient = (1/m) * X' * (h - y)
gradient[2:end] = gradient[2:end] + (lambda/m) * theta[2:end]
return gradient
end
(Where theta
is a vector of parameters for the hypothesis function and lambda
is a regularization parameter.)
Then, according to the instructions given here: https://github.com/JuliaOpt/Optim.jl I try to call the optimization function like this:
# those are handle functions I define to pass them as arguments:
c(theta::Vector) = cost(X, y, theta, lambda)
g!(theta::Vector, gradient::Vector) = grad(X, y, theta, lambda, gradient)
# then I do
optimize(c,some_initial_theta)
# or maybe
optimize(c,g!,initial_theta,method = :l_bfgs) # try a different algorithm
In both cases it says that it fails to converge and the output looks kind of awkard:
julia> optimize(c,initial_theta)
Results of Optimization Algorithm
* Algorithm: Nelder-Mead
* Starting Point: [0.0,0.0,0.0,0.0,0.0]
* Minimum: [1.7787162051775145,3.4584135105727145,-6.659680628594007,4.776952006060713,1.5034743945407143]
* Value of Function at Minimum: -Inf
* Iterations: 1000
* Convergence: false
* |x - x'| < NaN: false
* |f(x) - f(x')| / |f(x)| < 1.0e-08: false
* |g(x)| < NaN: false
* Exceeded Maximum Number of Iterations: true
* Objective Function Calls: 1013
* Gradient Call: 0
julia> optimize(c,g!,initial_theta,method = :l_bfgs)
Results of Optimization Algorithm
* Algorithm: L-BFGS
* Starting Point: [0.0,0.0,0.0,0.0,0.0]
* Minimum: [-6.7055e-320,-2.235e-320,-6.7055e-320,-2.244e-320,-6.339759952602652e-7]
* Value of Function at Minimum: 0.693148
* Iterations: 1
* Convergence: false
* |x - x'| < 1.0e-32: false
* |f(x) - f(x')| / |f(x)| < 1.0e-08: false
* |g(x)| < 1.0e-08: false
* Exceeded Maximum Number of Iterations: false
* Objective Function Calls: 75
* Gradient Call: 75
Question
Is my method (from my first code listing) incorrect? Or am I misusing Optim.jl functions? Either way, what is the proper way to define and minimize the cost function here?
It's my first time with Julia and probably I'm doing something terribly wrong, but I can't tell what exactly. Any help will be appreciated!
EDIT
X
and y
are the training set, X
is a 90x5 matrix, y
a 90x1 vector (namely, my training set is taken from Iris - I don't think it matters).