3

I am trying to run a linear regression using fminunc to optimize my parameters. However, while the code never fails, the fminunc function seems to only be running once and not converging. The exit flag that the fminunc funtion returns is -3, which - according to documentation- means "The trust region radius became excessively small". What does this mean and how can I fix it?

This is my main:

load('data.mat');
% returns matrix X, a matrix of data

% Initliaze parameters
[m, n] = size(X);
X = [ones(m, 1), X];
initialTheta = zeros(n + 1, 1); 
alpha = 1;
lambda = 0;

costfun = @(t) costFunction(t, X, surv, lambda, alpha);
options = optimset('GradObj', 'on', 'MaxIter', 1000);
[theta, cost, info] = fminunc(costfun, initialTheta, options);

And the cost function:

function [J, grad] = costFunction(theta, X, y, lambda, alpha)

%COSTFUNCTION Implements a logistic regression cost function.
%   [J grad] = COSTFUNCTION(initialParameters, X, y, lambda) computes the cost
%   and the gradient for the logistic regression. 
% 

m = size(X, 1);

J = 0;
grad = zeros(size(theta));

% un-regularized
z = X * theta;
J = (-1 / m) * y' * log(sigmoid(z)) + (1 - y)' * log(1 - sigmoid(z));
grad = (alpha / m) * X' * (sigmoid(z) - y);

% regularization
theta(1) = 0;
J = J + (lambda / (2 * m)) * (theta' * theta);
grad = grad + alpha * ((lambda / m) * theta);

endfunction

Any help is much appreciated.

feargmac
  • 41
  • 4
  • I don't know how to solve this, but I suspect mathematically what's happening is that your function behaves in unpredictable ways, such that the [trust region](http://www.applied-mathematics.net/optimization/optimizationIntro.html) approach is failing to find decent solutions, and instead keeps 'shrinking' the trust region because each result is worse than the previous one, until it decides the trust region has grown too small to be of any practical use. It may just be that if you choose a different starting point away from the problem area, the algorithm might behave better and thus converge. – Tasos Papastylianou Aug 01 '18 at 18:22
  • The introductory comment to the [relevant section of the octave manual](https://octave.org/doc/interpreter/Minimizers.html) may also be relevant; is your function definitely a continuous differentiable one, or does it have discontinuities? Note also the bit in fminunc that says `Application Notes: If the objective function is a single nonlinear equation of one variable then using fminbnd is usually a better choice.` and `If the function has discontinuities it may be better to use a derivative-free algorithm such as fminsearch.` – Tasos Papastylianou Aug 01 '18 at 18:25

1 Answers1

0

There are a few issues with the code above:

Using the fminunc means you don't have to provide an alpha. Remove all instances of it from the code and your gradient functions should look like the following

grad = (1 / m) * X' * (sigmoid(z) - y);

and

grad = grad + ((lambda / m) * theta);  % This isn't quite correct, see below

In the regularization of the grad, you can't use theta as you don't add in the theta for j = 0. There are a number ways to do this, but here is one

temp = theta;
temp(1) = 0;
grad = grad + ((lambda / m) * temp);

You missing a set of bracket in your cost function. The (-1 / m) is being applied only to a portion of the rest of the equation. It should look like.

J = (-1 / m) * ( y' * log(sigmoid(z)) + (1 - y)' * log(1 - sigmoid(z)) );

And finally, as a nit, a lambda value of 0 means that your regularization does nothing.

Ron Z
  • 16
  • 2
  • Good point on the alpha and for the lambda I was just putting it in as 0 for now to optimize after I got the code working. I see that I messed up the brackets and when I fixed that the code ran and produced a new error that I will investigate but it seems to have solved the trust region issue. I believe that the missing brackets were causing the function to fail to converge. I have accepted your answer and given it a positive vote but my lack of site reputation causes it to not be shown. Thanks for the help! – feargmac Aug 03 '18 at 21:01
  • Your problem may be that your alpha is too large for your data set. Try a smaller one – Ron Z Aug 04 '18 at 22:42