How to tell if Newtons-Method Fails

Question

I am creating a basic Newton-method algorithm for an unconstrained optimization problem, and my results from the algorithm are not what I expected. It is a simple objective function so it is clear that the algorithm should converge on (1,1). This is confirmed by a gradient descent algorithm I created previously, here:

def grad_descent(x, t, count, magnitude):
    xvalues.append(x)
    gradvalues.append(np.array([dfx1(x), dfx2(x)]))
    fvalues.append(f(x))   
    temp=x-t*dfx(x)
    x = temp
    magnitude = mag(dfx(x))    
    count+=1

    return xvalues, gradvalues, fvalues, count

My attempt at creating an algorithm for Newtons-Method is here:

def newton(x, t, count, magnitude):
  xvalues=[]
  gradvalues=[]
  fvalues=[]
  temp=x-f(x)/dfx(x)

  while count < 10:
    xvalues.append(x)
    gradvalues.append(dfx(x))
    fvalues.append(f(x))  

    temp=x-t*f(x)/dfx(x)
    x = temp
    magnitude = mag(dfx(x))    
    count+=1
    if count > 100:
      break
  return xvalues, gradvalues, fvalues, count

Here is the objective function and gradient function:

f = lambda x: 100*np.square(x[1]-np.square(x[0])) + np.square((1-x[0]))
dfx = lambda x: np.array([-400*x[0]*x[1]+400*np.power(x[0],3)+2*x[0]-2, 200*(x[1]-np.square(x[0]))])

Here are the initial conditions. Note that alpha and beta are not used in the newton method.

x0, t0, alpha, beta, count = np.array([-1.1, 1.1]), 1, .15, .7, 1
magnitude = mag(np.array([dfx1(x0), dfx2(x0)]))

To call the function:

xvalues, gradvalues, fvalues, iterations = newton(x0, t0, count, magnitude)

This produce very strange results. Here are the first 10 iterations of the xvalues, gradient values, and function solution for its respective x input:

[array([-1.1,  1.1]), array([-0.99315589,  1.35545455]), array([-1.11651296,  1.11709035]), array([-1.01732476,  1.35478987]), array([-1.13070578,  1.13125051]), array([-1.03603697,  1.35903467]), array([-1.14368874,  1.14364506]), array([-1.05188162,  1.36561528]), array([-1.15600558,  1.15480705]), array([-1.06599492,  1.37360245])]
[array([-52.6, -22. ]), array([142.64160215,  73.81918332]), array([-62.07323963, -25.90216846]), array([126.11789251,  63.96803995]), array([-70.85773749, -29.44900758]), array([114.31050737,  57.13241151]), array([-79.48668009, -32.87577304]), array([104.93863096,  51.83206539]), array([-88.25737032, -36.308371  ]), array([97.03403558, 47.45145765])]
[5.620000000000003, 17.59584998020613, 6.156932949106968, 14.29937453260906, 6.7080172227439725, 12.305727666787176, 7.297442528545537, 10.926625703722639, 7.944104584786208, 9.89743708419569]

Here is the final output:

final_value = print('Final set of x values: ', xvalues[-1])
final_grad = print('Final gradient values: ', gradvalues[-1])
final_f = print('Final value of the object function with optimized inputs: ', fvalues[-1])
final_grad_mag = print('Final magnitude of the gradient with optimized inputs: ', mag(np.array([dfx1(xvalues[-1]), dfx2(xvalues[-1])])))
total_iterations = print('Total iterations: ', iterations)

a 3d plot is shown here code:

x = np.array([i[0] for i in xvalues])
y = np.array([i[1] for i in xvalues])
z = np.array(fvalues)
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.scatter(x, y, z, label='Newton Method')
ax.legend()

Is the reasoning for this because the initial guess is so close to the optimal point, or is there some error in my algorithm that I am not catching? Any advice would be greatly appreciated. It looks like the solution may even be oscillating, but it is difficult to tell

I think we need to see the functions `f(x)` and `dfx(x)`. Also, what is the purpose (and value) of `t` in your calculations? In short, this question needs a [mcve]. — user3386109, Sep 10 '18 at 22:07
Just added in f(x) and df(x). t is set to 1 for this specific algorithm, so it has no purpose. — RocketSocks22, Sep 10 '18 at 22:12
I suspect that it's only a line or two, but could you please finish this with a main program that produces the output you show? — Prune, Sep 10 '18 at 22:15
Just added in the function call, initial conditions, and output (anonymously on accident...) Apologies in advance for not formatting the question properly. I am new to stackoverflow and still trying to learn the ropes — RocketSocks22, Sep 10 '18 at 22:27
Is the function and Hessian correct? To me it seems the minimum for the given function is at (1,1) not (-1,1). — Hans Olsson, Sep 11 '18 at 12:18
You are correct that it is meant to converge at (1, 1). I've fixed that mistake in the original question. This is confirmed by the gradient descent algorithm — RocketSocks22, Sep 11 '18 at 12:54

RocketSocks22 · Answer 1 · 2018-09-11T14:22:46.097

0

I think I've found part of the problem. I was using the incorrect Newton's algorithm. While before I was using:
x_{k+1} = x_{k}-^f(x)⁄_∇f(x)

The correct update is:
x_{k+1} = x_{k} - [f''(x_{k})]^-1f'(x_{k})

When I changed this, the result still diverges, but it is slightly better. The new function is here:

f = lambda x: 100*np.square(x[1]-np.square(x[0])) + np.square((1-x[0]))
dfx1 = lambda x: -400*x[0]*x[1]+400*np.power(x[0],3)+2*x[0]-2
dfx2 = lambda x: 200*(x[1]-np.square(x[0]))
dfx = lambda x: np.array([-400*x[0]*x[1]+400*np.power(x[0],3)+2*x[0]-2, 200*(x[1]-np.square(x[0]))])
dfx11 = lambda x: -400*(x[1])+1200*np.square(x[0])+2
dfx12 = lambda x: -400*x[0]
dfx21 = lambda x: -400*x[0]
dfx22 = lambda x: 200
hessian = lambda x: np.array(([dfx11(x0), dfx12(x0)], [dfx21(x0), dfx22(x0)]))
inv_hessian = lambda x: inv(np.array(([dfx11(x0), dfx12(x0)], [dfx21(x0), dfx22(x0)])))  

def newton(x, t, count, magnitude):
  xvalues=[]
  gradvalues=[]
  fvalues=[]
  temp = x-(inv_hessian(x).dot(dfx(x)))

  while count < 25:
    xvalues.append(x)
    gradvalues.append(dfx(x))
    fvalues.append(f(x))  

    temp = x-(inv_hessian(x).dot(dfx(x)))
    x = temp
    magnitude = mag(dfx(x))    
    count+=1
    if count > 100:
      break
  return xvalues, gradvalues, fvalues, count

The nearest the solution gets to converging is after the first step, where it goes to (-1.05, 1.1). However, it still diverges. I'm never worked with the Newton-method so I am unsure if this is as accurate as the algorithm is meant to get or not.

edited Sep 11 '18 at 14:22

answered Sep 11 '18 at 00:26

RocketSocks22

391
1
4
20

The one thing that's right about this is that the root of the derivative is either a maximum or a minimum of the function. The thing that's still unclear is whether the function, its derivative, and its second derivative are actually valid and related. – user3386109 Sep 11 '18 at 01:38
What do you mean by 'valid and related'? – RocketSocks22 Sep 11 '18 at 01:52
"Related" means that `dfx` is actually the derivative of `f`, and `f''` is actually the derivative of `dfx`. "Valid" means smooth and continuous, and `f''` doesn't have a zero near `x`. If `f''` does have a zero near `x`, then there's a divide-by-zero problem with the algorithm. – user3386109 Sep 11 '18 at 02:02
I am certain that the derivatives are correct. As far as the validity goes, I am unsure how to check for zeroes for f''. And what constraints qualify as 'near' x? <.01 difference? – RocketSocks22 Sep 11 '18 at 02:30
I think the best approach is to check f" before doing the division. If the value is less than epsilon (for example 1e-12), then throw an exception. – user3386109 Sep 11 '18 at 02:36
Neat properties of floating point arithmetic: dividing by near-0 quantities is only a problem if they're near 0 because of some process that causes a loss of precision. You would still need to check that it's not **exactly** 0, but there isn't any reason to choose 1e-1 vs 1e-200 for a near-0 division check at runtime. – Hans Musgrave Sep 11 '18 at 02:40
The reason you can get loss of precision from that is if the process that produced the near-0 quantity had a loss of precision itself. Then division amplifies the problem. A classic example is subtracting two numbers that are large relative to their difference. – Hans Musgrave Sep 11 '18 at 02:42
I understand, but f'' is not simply one number in this case. f'' is the 2x2 Hessian matrix. How would I check that a matrix is not close to zero? Also, with the updated algorithm, f'' is not being divided so I do not think that this should be a problem – RocketSocks22 Sep 11 '18 at 02:55

score 0 · Answer 2 · answered Sep 12 '18 at 15:30

I am now certain that there is something wrong with the python code. I decided to implement the algorithm in Matlab and it seems to work fine. Here is that code:

clear; clc;
x=[-1.1, 1.1]';
t=1;
count=1;

xvalues=[];

temp = x - inv([(-400*x(2)+1200*x(1)^2+2), -400*x(1); -400*x(1), 200]);
disp(x-inv([(-400*x(2)+1200*x(1)^2+2), -400*x(1); -400*x(1), 200])*[-400*x(1)*x(2)+400*x(1)^3+2*x(1)-2; 200*(x(2)-x(1)^2)])

while count<10
    xvalues(count,:)= x;
    temp = x - inv([(-400*x(2)+1200*x(1)^2+2), -400*x(1); -400*x(1), 200]) * [-400*x(1)*x(2)+400*x(1)^3+2*x(1)-2; 200*(x(2)-x(1)^2)];    
    x = temp;
    count = count+1;
end

disp(xvalues)

Output:

-1.1000    1.1000
   -1.0087    1.0091
   -0.2556   -0.5018
   -0.2446    0.0597
    0.9707   -0.5348
    0.9708    0.9425
    1.0000    0.9991
    1.0000    1.0000
    1.0000    1.0000

score 0 · Accepted Answer · answered Sep 16 '18 at 04:51

So I finally figured out what was going on with this. It was all about what data structures Python was storing my variables as. As such, I set all my values to 'float32' and initialized the variables being iterated. Working code is here:

Note: a lambda function is an anonymous function useful for single-line expressions

f = lambda x: 100*np.square(x[1]-np.square(x[0])) + np.square((1-x[0]))
dfx = lambda x: np.array([-400*x[0]*x[1]+400*np.power(x[0],3)+2*x[0]-2, 200*(x[1]-np.square(x[0]))], dtype='float32')
dfx11 = lambda x: -400*(x[1])+1200*np.square(x[0])+2
dfx12 = lambda x: -400*x[0]
dfx21 = lambda x: -400*x[0]
dfx22 = lambda x: 200
hessian = lambda x: np.array([[dfx11(x), dfx12(x)], [dfx21(x), dfx22(x)]], dtype='float32')
inv_hessian = lambda x: inv(hessian(x))
mag = lambda x: math.sqrt(sum(i**2 for i in x))

def newton(x, t, count, magnitude):
  xvalues=[]
  gradvalues=[]
  fvalues=[]
  temp = np.zeros((2,1))

  while magnitude > .000005:
    xvalues.append(x)
    gradvalues.append(dfx(x))
    fvalues.append(f(x))      

    deltaX = np.array(np.dot(-inv_hessian(x), dfx(x)))
    temp = np.array(x+t*deltaX)
    x = temp
    magnitude = mag(deltaX)    
    count+=1
  return xvalues, gradvalues, fvalues, count

x0, t0, alpha, beta, count = np.array([[-1.1], [1.1]]), 1, .15, .7, 1
xvalues, gradvalues, fvalues, iterations = newton(x0, t0, count, magnitude)

final_value = print('Final set of x values: ', xvalues[-1])
final_grad = print('Final gradient values: ', gradvalues[-1])
final_f = print('Final value of the object function with optimized inputs: ', fvalues[-1])
final_grad_mag = print('Final magnitude of the gradient with optimized inputs: ', mag(np.array([dfx1(xvalues[-1]), dfx2(xvalues[-1])])))
total_iterations = print('Total iterations: ', iterations
print(xvalues)

Output:

Final set of x values:  [[0.99999995]
 [0.99999987]]
Final gradient values:  [[ 9.1299416e-06]
 [-4.6193604e-06]]
Final value of the object function with optimized inputs:  [5.63044182e-14]
Final magnitude of the gradient with optimized inputs:  1.02320249276675e-05
Total iterations:  9
[array([[-1.1],
       [ 1.1]]), array([[-1.00869558],
       [ 1.00913081]]), array([[-0.25557778],
       [-0.50186648]]), array([[-0.24460602],
       [ 0.05971173]]), array([[ 0.97073805],
       [-0.53472879]]), array([[0.97083687],
       [0.94252417]]), array([[0.99999957],
       [0.99914868]]), array([[0.99999995],
       [0.99999987]])]

How to tell if Newtons-Method Fails

3 Answers3

Note: a lambda function is an anonymous function useful for single-line expressions

Linked