I am trying to solve the following optimization problem in octave
The first contraint is that A be positive semi-definite. S is a set of data points such that if (xi,xj) is in S then xi is similar to xj and D is a set of data points such that if (xi,xj) is in D then xi and xj are dissimilar. Note that the above formula is 2 separate sums and the second sum is not nested. Also xi and xj are assumed to be column vectors of length N.
Because this is a nonlinear optimization I am trying to use octave's nonlinear program solver, sqp. The problem is that if I just provide it with the function to optimize, on some small toy tests the, BFGS method to find the Hessian fails. Because of this I tried to provide my own Hessian function but now this problem occurs
error: __qp__: operator *: nonconformant arguments (op1 is 2x2, op2 is 3x1)
error: called from:
error: /usr/share/octave/3.6.3/m/optimization/qp.m at line 393, column 26
error: /usr/share/octave/3.6.3/m/optimization/sqp.m at line 414, column 32
when I make the following call to sqp
[A, ~, Info] = sqp(initial_guess, {@toOpt, @CalculateGradient,@CalculateHessian},
[],[],0,[],maxiter);
I simplified the constraint that A be positive semi-definite and diagonal by only solving for the diagonal entries and constraining all the diagonal entries to be >=0. initial_guess is a vector of ones that is N long.
Here is my code to calculate what I believe to be the Hessian matrix
%Hessian = CalculateHessian(A)
%calculates the Hessian of the function we are optimizing as follows
%H(i,j) = (sumsq(D(:,i),1) * sumsq(D(:,j),1)) / (sum(A.*sumsq(D,1))^2)
%where D is a matrix of of differences between observations that are dissimilar, with one difference on each row
%and sumsq is the sum of the squares
%input A: the current guess for A
%output Hessian: The hessian of the function we are optimizing
function Hessian = CalculateHessian(A)
global HessianNumerator; %this is a matrix with the numerator of H(i,j)
global Dsum_of_squares; %the sum of the squares of the differences of each dimensions of the dissimilar observations
if(iscolumn(A)) %if A is a column vector
A = A'; %make it a row vector. necessary to prevent broadcasting
endif
if(~isempty(Dsum_of_squares)) %if disimilar constraints were provided
Hessian = HessianNumerator / (sum(A.*Dsum_of_squares)^2)
else
Hessian = HessianNumerator; %the hessian is a matrix of 0s
endif
endfunction
and Dsum_of_squares and HessianNumertor are
[dissimilarRow,dissimilarColumn] = find(D); %find which observations are dissimilar to each other
DissimilarDiffs = X(dissimilarRow,:) - X(dissimilarColumn,:); %take the difference between the dissimilar observations
Dsum_of_squares = sumsq(DissimilarDiffs,1);
HessianNumerator = Dsum_of_squares .* Dsum_of_squares'; %calculate the numerator of the Hessian. it is a constant value
X is a M x N matrix with one observation per row.
D is a M x M dissimilarity matrix. if D(i,j) is 1 then row i of X is dissimlar to row j. 0 otherwise.
I believe my error is in one of the following areas (from least likely to most likely)
- The math I used to derive the Hessian function is wrong. The formula I am using is in my comments for the function.
- My implementation of the math.
- The Hessian Matrix that sqp wants is different from the one described on the Hessian Matrix Wikipedia page.
Any help would be greatly appreciated. If you need me to post more code I would be happy to do so. Right now the amount of code to try and solve the optimization is about 160 lines.
Here is the test case I am running that causes the code to fail. It works if I only pass it the gradient function.
X = [1 2 3;
4 5 6;
7 8 9;
10 11 12];
S = [0 1 1 0;
1 0 0 0;
1 0 0 0;
0 0 0 0]; %this means row 1 of X is similar to rows 2 and 3
D = [0 0 0 0;
0 0 0 0;
0 0 0 1;
0 0 1 0]; %this means row 3 of X is dissimilar to row 4
gml(X,S,D, 200); %200 is the maximum number of iterations for sqp to run