2

I'm solving a one dimensional non-linear equation with Newton's method. I'm trying to figure out why one of the implementations of Newton's method is converging exactly within floating point precision, wheres another is not.

The following algorithm does not converge:

enter image description here

whereas the following does converge:

enter image description here

You may assume that the functions f and f' are smooth and well behaved. The best explanation I was able to come up with is that this is somehow related to what's called iterative improvement (Golub and Van Loan, 1989). Any further insight would be greatly appreciated!

Here is a simple python example illustrating the issue

# Python
def f(x):
    return x*x-2.

def fp(x):
    return 2.*x

xprev = 0.

# converges
x = 1. # guess
while x != xprev:
    xprev = x
    x = (x*fp(x)-f(x))/fp(x)
print(x)

# does not converge
x = 1. # guess
while x != xprev:
    xprev = x
    dx = -f(x)/fp(x)
    x = x + dx
print(x)

Note: I'm aware of how floating point numbers work (please don't post your favourite link to a website telling me to never compare two floating point numbers). Also, I'm not looking for a solution to a problem but for an explanation as to why one of the algorithms converges but not the other.

Update: As @uhoh pointed out, there are many cases where the second method does not converge. However, I still don't know why the second method converges so much more easily in my real world scenario than the first. All the test cases have very simple functions f whereas the real world f has several hundred lines of code (which is why I don't want to post it). So maybe the complexity of f is important. If you have any additional insight into this, let me know!

hanno
  • 6,401
  • 8
  • 48
  • 80
  • 1
    When are you assigning `Xprev` in the first algorithm? If it is before evaluating `dx` then algebraically the two are the same. Perhaps there's an overflow for very small `f'(x)`? – D Stanley Apr 07 '15 at 01:03
  • Typo. Now corrected. – hanno Apr 07 '15 at 01:04
  • `f'(x)` is typically ~1 and `x` is ~0.1. So I don't think an overflow occurs anywhere. – hanno Apr 07 '15 at 01:10
  • Algebraically your two formulas are the same, so either there's a bug in the implementation or you're never getting _exactly_ the same floating point. Typically you compare two floating points numbers by comparing their difference to a _very_ small number. – D Stanley Apr 07 '15 at 01:16
  • Yes, typically I would compare it to a very small number, which would be fine. However, I'm still puzzled as to why the second algorithm converges at all (for many different runs with different initial `x` and `f`, i.e. it's not a coincidence of a specific set of numbers cancelling). – hanno Apr 07 '15 at 01:21
  • I added a piece of python code that illustrates the issue. – hanno Apr 07 '15 at 01:34
  • The second algorithm appears to converge because your computation of the new `x` value is typically subject to tons of cancellation. I'm too sleepy right now to come up with an example on which it diverges, but I'm fairly sure one exists. – tmyklebu Apr 07 '15 at 04:26

4 Answers4

4

None of the methods is perfect:

One situation in which both methods will tend to fail is if the root is about exactly midway between two consecutive floating-point numbers f1 and f2. Then both methods, having arrived to f1, will try to compute that intermediate value and have a good chance of turning up f2, and vice versa.

                         /f(x)
                       /
                     /
                   /
                 /
  f1           /
--+----------------------+------> x
           /             f2
         /
       /
     /
Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • Isn't that almost always the case (including in the above example)? – hanno Apr 07 '15 at 12:53
  • @hanno No, the root can also be close to a floating-point value, and that makes it easier to have `x` compute the same as `xprev`. But then again, if there are significant approximations in the computation of `f`, as pointed out by Simon, one needs to understand what function one is computing a root of. – Pascal Cuoq Apr 07 '15 at 13:40
3

"I'm aware of how floating point numbers work...". Perhaps the workings of floating-point arithmetic are more complicated than imagined.

This is a classic example of cycling of iterates using Newton's method. The comparison of a difference to an epsilon is "mathematical thinking" and can burn you when using floating-point. In your example, you visit several floating-point values for x, and then you are trapped in a cycle between two numbers. The "floating-point thinking" is better formulated as the following (sorry, my preferred language is C++)

std::set<double> visited;
xprev = 0.0;
x = 1.0;
while (x != prev)
{
    xprev = x;
    dx = -F(x)/DF(x);
    x = x + dx;
    if (visited.find(x) != visited.end())
    {
        break;  // found a cycle
    }
    visited.insert(x);
}
  • I understand that, but why does this never happen for the other implementation? – hanno Apr 07 '15 at 02:01
  • (x*F'(x)-F(x))/F'(x) and x-F(x)/F'(x) are equivalent expressions using real arithmetic. They are not equivalent expressions using floating-point arithmetic. Step through with a debugger and analyze both expressions to see that they can be different. The floating-point rounding behavior of your first implementation appears to be "nice" to you. The second implementation's cycling is clearly a floating-point thing. The function F(x) is convex and you are close to a root, so theoretically Newton's should converge. But floating-point arithmetic is not real arithmetic... – Dave Eberly Apr 07 '15 at 04:27
3

I'm trying to figure out why one of the implementations of Newton's method is converging exactly within floating point precision, wheres another is not.

Technically, it doesn't converge to the correct value. Try printing more digits, or using float.hex.

The first one gives

>>> print "%.16f" % x
1.4142135623730949
>>> float.hex(x)
'0x1.6a09e667f3bccp+0'

whereas the correctly rounded value is the next floating point value:

>>> print "%.16f" % math.sqrt(2)
1.4142135623730951
>>> float.hex(math.sqrt(2))
'0x1.6a09e667f3bcdp+0'

The second algorithm is actually alternating between the two values, so doesn't converge.

The problem is due to catastrophic cancellation in f(x): as x*x will be very close to 2, when you subtract 2, the result will be dominated by the rounding error incurred in computing x*x.

Simon Byrne
  • 7,694
  • 1
  • 26
  • 50
2

I think trying to force an exact equal (instead of err < small) is always going to fail frequently. In your example, for 100,000 random numbers between 1 and 10 (instead of your 2.0) the first method fails about 1/3 of the time, the second method about 1/6 of the time. I'll bet there's a way to predict that!

This takes ~30 seconds to run, and the results are cute!:

def f(x, a):
    return x*x - a

def fp(x):
    return 2.*x

def A(a):
    xprev = 0.
    x = 1.
    n = 0
    while x != xprev:
        xprev = x
        x = (x * fp(x) - f(x,a)) / fp(x)
        n += 1
        if n >100:
            return n, x
    return n, x



def B(a):
    xprev = 0.
    x = 1.
    n = 0
    while x != xprev:
        xprev = x
        dx = - f(x,a) / fp(x)
        x = x + dx
        n += 1
        if n >100:
            return n, x
    return n, x

import numpy as np
import matplotlib.pyplot as plt


n = 100000

aa = 1. + 9. * np.random.random(n)

data_A = np.zeros((2, n))
data_B = np.zeros((2, n))

for i, a in enumerate(aa):
    data_A[:,i] = A(a)
    data_B[:,i] = B(a)

bins = np.linspace(0, 110, 12)

hist_A = np.histogram(data_A, bins=bins)
hist_B = np.histogram(data_B, bins=bins)
print "A: n<10: ", hist_A[0][0], " n>=100: ", hist_A[0][-1]
print "B: n<10: ", hist_B[0][0], " n>=100: ", hist_B[0][-1]

plt.figure()
plt.subplot(1,2,1)
plt.scatter(aa, data_A[0])
plt.subplot(1,2,2)
plt.scatter(aa, data_B[0])
plt.show()

enter image description here

hanno
  • 6,401
  • 8
  • 48
  • 80
uhoh
  • 3,713
  • 6
  • 42
  • 95
  • That's what I'd like to understand! In the actual case I'm interested in (it's way to long to post), the first method seems to converge every single time in millions of test cases. Maybe it depends on the complexity of the function `f`? – hanno Apr 07 '15 at 03:14
  • Could be. I just added a plot to the program, interestingly, the second method tends NOT to fail between 4 and 8 - go figure! I wish I could post the graphic (don't have a high-enough score) can someone else? – uhoh Apr 07 '15 at 03:20