2

I am working in a program that concerns the optimization of some objective function obj over the scalar beta. The true global minimum beta0 is set at beta0=1.

In the mwe below you can see that obj is constructed as the sum of the 100-R (here I use R=3) smallest eigenvalues of the 100x100 symmetric matrix u'*u. While around the true global minimum obj "looks good" when I plot the objective function evaluated at much larger values of beta the objective function becomes very unstable (here or running the mwe you can see that multiple local minima (and maxima) appear, associated with values of obj(beta) smaller than the true global minimum).

My guess is that there is some sort of "numerical instability" going on, but I am unable to find the source.


%Matrix dimensions
N=100;
T=100;

%Reproducibility
rng('default');

%True global minimum
beta0=1;

%Generating data

l=1+randn(N,2);
s=randn(T+1,2);
la=1+randn(N,2);
X(1,:,:)=1+(3*l+la)*(3*s(1:T,:)+s(2:T+1,:))';
s=s(1:T,:);
a=(randn(N,T));
Y=beta0*squeeze(X(1,:,:))+l*s'+a;

%Give "beta" a large value
beta=1e6;

%Compute objective function
u=Y-beta*squeeze(X(1,:,:));

ev=sort(eig(u'*u));  % sort eigenvalues

obj=sum(ev(1:100-3))/(N*T); % "obj" is sum of 97 smallest eigenvalues



This evaluates the objective function at obj(beta=1e6). I have noticed that some of the eigenvalues from eig(u'*u) are negative (see object ev), when by construction the matrix u'*u is positive semidefinite

I am guessing this may have to do with floating point arithmetic issues and may (partly) be the answer to the instability of my function, but I am not sure.

Finally, this is what the objective function obj evaluated at a wide range of values for betalooks like:


% Now plot "obj" for a wide range of values of "beta"
clear obj
betaGrid=-5e5:100:5e5;

for i=1:length(betaGrid)
    u=Y-betaGrid(i)*squeeze(X(1,:,:));
    ev=sort(eig(u'*u));
    obj(i)=sum(ev(1:100-3))/(N*T); 
end

plot(betaGrid,obj,"*")
xlabel('\beta') 
ylabel('obj')


This gives this figure, which shows how unstable it becomes for extreme values for beta.

econ86
  • 133
  • 4
  • 3
    If you do not disclose information about your function and equation, how are we supposed to help? Lr(beta) can be `if(abs(beta))>10 rand();` for what I know. Read [ask]. – Ander Biguri Aug 14 '19 at 14:59
  • At those huge numbers particular floating point operations may create errors that may propagate with some other floating point operations. Or not, depends on the equation – Ander Biguri Aug 14 '19 at 15:03
  • The "instability" is not in the minimization algorithm, but in the computation of the function L. Nothing can be said without information on L. –  Aug 14 '19 at 15:25

1 Answers1

4

The key here is noticing that computing eigenvalues can be a hard problem. Actually the condition number for this problem is K = norm(A) * norm(inv(A)) (don't compute it this way, use cond(). This means the the an (relative) perturbation in the inpute (i.e. the matrix entries) gets amplified by the condition number when computing the output. I modified your code a little bit to compute and plot the condition number in each step. It turns out that for a large part of the range you are interested in it is greater than 10^17, which is abysmal. (Note that the double floating point numbers are accurate to not quite 16 significant (decimal) digits. This means even the representation error of double floating point numbers will here produce errors that make every digit "insignificant".) This already explains the bad behaviour. You should note that usually we can compute the largest eigenvalues quite accurately, the errors in the smaller (in magnitude) ones usually increase.

enter image description here

If the condition number was better (closer to 1) I would have suggested computing the singular values, as they happen to be the eigenvalues (due to the symmetry). The svd is numerically more stable, but with this really bad condition even this will not help. In the following modification of the final snippet I added a graph that plots the condition number.

The only case where anything is salvageable is for R=0, then we actually want to compute the sum of all eigenvalues, which happens to be the trace of our matrix, which can easily be computed by just summing the diagonal entries.

To summarize: This problem seems to have an inherent bad condition, so it doesn't really matter how you compute it. If you have a completely different formulation for the same problem that might help.

% Now plot "obj" for a wide range of values of "beta"
clear obj
L = 5e5; % decrease to 5e-1 to see that the condition number is still >1e9 around the optimum
betaGrid=linspace(-L,L,1000);
condition = nan(size(betaGrid));
for i=1:length(betaGrid)
    disp(i/length(betaGrid))
    u=Y-betaGrid(i)*squeeze(X(1,:,:));
    A = u'*u;
    ev=sort(eig(A));
    condition(i) = cond(A);
    obj(i)=sum(ev(1:100-3))/(N*t); % for R=0 use trace(A)/(N*T);
end

subplot(1,2,1);
plot(betaGrid,obj,"*")
xlabel('\beta') 
ylabel('obj')
subplot(1,2,2);
semilogy(betaGrid, condition);
title('condition number');
flawr
  • 10,814
  • 3
  • 41
  • 71
  • Thank you for your answer, @flawr! As you comment, the condition number is still huge around the optimum (around 10^13). However, in that neighborhood the objective function seems well behaved (or equivalently the eigenvalues are computed correctly).Am I right to assume as a rule of thumb that problems "start" in this particular case when the condition number is close to 10^16 (about the 16 digits of precision)? – econ86 Aug 15 '19 at 11:35
  • No I don't think it is that clear. If you consider the close proximity of the optimum you still have condition numbers of around 10^9 which means you still shouldn't trust the output of your computation at all. So I'd interpret it the other way around: If your condition number is at 10^16 there is *no way* your results can make sense (because you really "lose" all the information, as even the rounding errors of the input cause the algorithm to diverge), and if it is lower they might look better but there is still no reason to trust the results. – flawr Aug 15 '19 at 11:56
  • I see. To clarify, if I do something simple like `cond(chi2rnd(1,100,100))` the condition is already high, `2.2293e+03`. I understand large condition numbers as `2.2293e+03`as indicating loss of precision, but still such a loss may be "neglibigle" for a particular problem (which in my program in the case of proximity to the optimum this may be the case since `beta=1` remains clearly the global minimum). Does this sound good? Thanks again! – econ86 Aug 15 '19 at 13:45
  • Yes one could say that. But then I think the definition of the condition (of a problem) is defined as the "amplification" of the relative error: Let's say we perturb the input by `p` percent which will cause the output to be perturbed by `q` percent, then the condition is (roughly) `q/p`. And the condition is a property of the *problem* (not the *algorithm* - if we analyze the same thing for an *algorithm* we talk about the *stability*). – flawr Aug 15 '19 at 14:58