Perplexing behaviour when approximating the derivative in haskell

Question

I have defined a typeclass Differentiable to be implemented by any type which can operate on infinitesimals. Here is an example:

class Fractional a => Differentiable a where
    dif :: (a -> a) -> (a -> a)
    difs :: (a -> a) -> [a -> a]
    difs = iterate dif

instance Differentiable Double where
    dif f x = (f (x + dx) - f(x)) / dx
        where dx = 0.000001

func :: Double -> Double
func = exp

I have also defined a simple Double -> Double function to differentiate.

But when I test this in the ghc this happens:

... $ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/  :? for help
Prelude> :l testing
[1 of 1] Compiling Main             ( testing.hs, interpreted )
Ok, one module loaded.
*Main> :t func
func :: Double -> Double
*Main> derivatives = difs func
*Main> :t derivatives
derivatives :: [Double -> Double]
*Main> terms = map (\f -> f 0) derivatives
*Main> :t terms
terms :: [Double]
*Main> take 5 terms
[1.0,1.0000004999621837,1.000088900582341,-222.0446049250313,4.440892098500626e8]
*Main>

The approximations to the nth derivative of e^x|x=0 are:

[1.0,1.0000004999621837,1.000088900582341,-222.0446049250313,4.440892098500626e8]

The first and 2nd derivatives are perfectly reasonable approximations given the setup, but suddenly, the third derivative of func at 0 is... -222.0446049250313! HOW!!?

Looks like rounding errors, the next one has a larger error... If you divide the error easily "blows up". — Willem Van Onsem, Nov 05 '20 at 23:21
I'm inclined to agree. Floating point numbers are spooky, and not very good representations of reals if you scrutinize closely. At each step you take the previous approximation and multiply its importance by a million. You soon run out of precision bits. — amalloy, Nov 05 '20 at 23:30
@Willem Van Onsem The fact that the derivatives lost accuracy at the way they did is increadibly spooky and is certainly something that I wasn't expecting. But what shocked me the most is that the 3rd derivative evaluated as negative. Approximating the derivative of an exponential with a positive dx should always result in overestimations, and - 222 is far from an overestimation. — , Nov 05 '20 at 23:47
@Vye: well division is *numerically unstable*. That means that very small (rounding) errors, or errors when evaluating `exp` can easily blow up. This matters since `exp`, etc. are in hardware often implemented with interpolation and lookup-tables, so it is not perfectly continuous. — Willem Van Onsem, Nov 05 '20 at 23:53

leftaroundabout · Accepted Answer · 2020-11-05T23:51:17.593

The method you're using here is a finite difference method of 1st-order accuracy.

Layman's translation: it works, but is pretty rubbish numerically speaking. Specifically, because it's only 1st-order accurate, you need those really small steps to get good accuracy even with exact-real-arithmetic. You did choose a small step size so that's fine, but small step size brings in another problem: rounding errors. You need to take the difference f (x+δx) - f x with small δx, meaning the difference is small whereas the individual values may be large. That always brings up the floating-point inaccuracy – consider for example

Prelude> (1 + pi*1e-13) - 1
3.141931159689193e-13

That might not actually hurt that much, but since you then need to divide by δx you boost up the error.

This issue just gets worse/compounded as you go to the higher derivatives, because now each of the f' x and f' (x+δx) has already an (non-identical!) boosted error on it, so taking the difference and boosting again is a clear recipe for disaster.

The simplest way to remediate the problem is to switch to a 2nd-order accurate method, the obvious being central difference. Then you can make the step a lot bigger, and thus largely avoid rounding issues:

Prelude> let dif f x = (f (x + δx) - f(x - δx)) / (2*δx) where δx = 1e-3
Prelude> take 8 $ ($0) <$> iterate dif exp
[1.0,1.0000001666666813,1.0000003333454632,1.0000004990740052,0.9999917560676863,0.9957312752106873,8.673617379884035,7806.255641895632]

You see the first couple of derivatives are good now, but then eventually it also becomes unstable – and this will happen with any FD method as you iterate it. But that's anyway not really a good approach: note that every evaluation of the n-th derivative requires 2 evaluations of the n−1-th. So, the complexity is exponential in the derivative degree.

A better approach to approximate the n-th derivative of an opaque function is to fit an n-th order polynomial to it and differentiate this symbolically/automatically. Or, if the function is not opaque, differentiate itself symbolically/automatically.

score 2 · Answer 2 · answered Nov 05 '20 at 23:40

tl;dr: the dx denominator gets small exponentially quickly, which means that even small errors in the numerator get blown out of proportion.

Let's do some equational reasoning on the first "bad" approximation, the third derivative.

dif (dif (dif exp))
= { definition of dif }
dif (dif (\x -> (exp (x+dx) - exp x)/dx))
= { definition of dif }
dif (\y -> ((\x -> (exp (x+dx) - exp x)/dx) (y+dx)
          - (\x -> (exp (x+dx) - exp x)/dx) y
           )/dx)
= { questionable algebra }
dif (\y -> (exp (y + 2*dx) - 2*exp (y + dx) + exp y)/dx^2)
= { alpha }
dif (\x -> (exp (x + 2*dx) - 2*exp (x + dx) + exp x)/dx^2)
= { definition of dif and questionable algebra }
\x -> (exp (x + 3*dx) - 3*exp (x + 2*dx) + 3*exp (x + dx) - exp x)/dx^3

Hopefully by now you can see the pattern we're getting into: as we take more and more derivatives, the error in the numerator gets worse (because we are computing exp farther and farther away from the original point, x + 3*dx is three times as far away e.g.) while the sensitivity to error in the denominator gets higher (because we are computing dx^n for the nth derivative). By the third derivative, these two factors become untenable:

> exp (3*dx) - 3*exp (2*dx) + 3*exp (dx) - exp 0
-4.440892098500626e-16
> dx^3
9.999999999999999e-19

So you can see that, although the error in the numerator is only about 5e-16, the sensitivity to error in the denominator is so high that you start to see nonsensical answers.

Perplexing behaviour when approximating the derivative in haskell

2 Answers2