[Octave]Using fminunc is not always giving a consistent solution

Question

I am trying to find the coefficients in an equation to model the step response of a motor which is of the form 1-e^x. The equation I'm using to model is of the form

a(1)*t^2 + a(2)*t^3 + a(3)*t^3 + ...

(It is derived in a research paper used to solve for motor parameters)

Sometimes using fminunc to find the coefficients works out okay, and I get a good result, and it matches the training data fairly well. Other times the returned coefficients are horrible (going extremely higher than what the output should be and is orders of magnitude off). This especially happens once I started using higher order terms: using any model that uses x^8 or higher (x^9, x^10, x^11, etc.) always produces bad results.

Since it works sometimes, I can't think why my implementation would be wrong. I have tried fminunc while providing the gradients and while also not providing the gradients yet there is no difference. I've looked into using other functions to solve for the coefficients, like polyfit, but in that instance it has to have terms that are raised from 1 to the highest order term, but the model I'm using has its lowest power at 2.

Here is the main code:

clear;

%Overall Constants
max_power = 7;

%Loads in data
%data = load('TestData.txt');
load testdata.mat

%Sets data into variables
indep_x = data(:,1); Y = data(:,2);

%number of data points
m = length(Y);

%X is a matrix with the independant variable
exps = [2:max_power];
X_prime = repmat(indep_x, 1, max_power-1); %Repeats columns of the indep var
X = bsxfun(@power, X_prime, exps);

%Initializes theta to rand vals
init_theta = rand(max_power-1,1);

%Sets up options for fminunc
options = optimset( 'MaxIter', 400, 'Algorithm', 'quasi-newton');

%fminunc minimizes the output of the cost function by changing the theta paramaeters
[theta, cost] = fminunc(@(t)(costFunction(t, X, Y)), init_theta, options)

%
Y_line = X * theta;

figure;
hold on; plot(indep_x, Y, 'or');
hold on; plot(indep_x, Y_line, 'bx');

And here is costFunction:

function [J, Grad] = costFunction (theta, X, Y)
   %# of training examples

   m = length(Y);

    %Initialize Cost and Grad-Vector
    J = 0;
    Grad = zeros(size(theta));

    %Poduces an output based off the current values of theta
    model_output = X * theta;

    %Computes the squared error for each example then adds them to get the total error
    squared_error = (model_output - Y).^2;
    J = (1/(2*m)) * sum(squared_error);

    %Computes the gradients for each theta t
    for t = 1:size(theta, 1)
        Grad(t) = (1/m) * sum((model_output-Y) .* X(:, t));
    end

endfunction

Any help or advice would be appreciated.

Using t^n to higher and higher powers is not a good _global_ basis for a function, so I would expect that the method will break down as you attempt to use higher order polynomials. — stephematician, Oct 23 '16 at 07:42
I would use a different basis for your approximate response, if you want to use a polynomial, chebyshev polynomials would be a good start. — stephematician, Oct 23 '16 at 07:43
High order polynomials are very difficult to deal with: they swing around a lot. I get already nervous when I see `x^4`. Start with a much more restrictive model, and if possible use a good starting point and apply good bounds on the parameters you want to estimate (you need a solver that handles bounds for this). — Erwin Kalvelagen, Oct 24 '16 at 18:47

score 1 · Answer 1 · answered Oct 23 '16 at 07:46

Try adding regularization to your costFunction:

function [J, Grad] = costFunction (theta, X, Y, lambda)
    m = length(Y);

    %Initialize Cost and Grad-Vector
    J = 0;
    Grad = zeros(size(theta));

    %Poduces an output based off the current values of theta
    model_output = X * theta;

    %Computes the squared error for each example then adds them to get the total error
    squared_error = (model_output - Y).^2;
    J = (1/(2*m)) * sum(squared_error);
    % Regularization
    J = J + lambda*sum(theta(2:end).^2)/(2*m);


    %Computes the gradients for each theta t
    regularizator = lambda*theta/m;
    % overwrite 1st element i.e the one corresponding to theta zero
    regularizator(1) = 0;
    for t = 1:size(theta, 1)
        Grad(t) = (1/m) * sum((model_output-Y) .* X(:, t)) + regularizator(t);
    end

endfunction

The regularization term lambda is used to control the learning rate. Start with lambda=1. The grater the value for lambda, the slower the learning will occur. Increase lambda if the behavior you describe persists. You may need to increase the number of iterations if lambda gets high. You may also consider normalization of your data, and some heuristic for initializing theta - setting all theta to 0.1 may be better than random. If nothing else it'll provide better reproducibility from training to training.

Adding the regularization certainly helped. But still in some instances it still fails. I turned on the exitflag and output. Whenever I get the results, it has an exit code of 1(magnitude of gradient is smaller than tolerance) and only runs 2/3 iterations. I think that might be indicative of something. — user3047023, Oct 24 '16 at 07:12
Regular sampling and higher order polynomials don't play very well - although regularisation can help, avoiding a numerically ill-conditioned system is difficult. Interpolation might be a different problem to what you are attempting, bu it demonstrates part of the issue: https://en.wikipedia.org/wiki/Lagrange_polynomial. I would recommend using a different sampling procedure (i.e. be more careful about where you try to match the function) or use a different set of basis functions. — stephematician, Oct 24 '16 at 08:37

[Octave]Using fminunc is not always giving a consistent solution

1 Answers1