haskell double precision underflow with repa

Question

I've written some code calculating a distance matrix using repa:

distance :: Int -> Int -> Mat -> Double
distance aindx bindx arr = let a = slice arr (Any :. aindx :. All)
                               b = slice arr (Any :. bindx :. All)-
                               sqdiff = R.map (\x -> x*x) $ R.zipWith (-) a b 
                            in sqrt $ sumAllS sqdiff

buildDistanceMatrix :: Mat -> Mat 
buildDistanceMatrix m = let (Z :. height :. width) = R.extent m
                            cords = fromListUnboxed (Z :. (height * height) )  [ (x,y) | x <- [0..height-1], y <- [0..height-1]]
                            dist = R.smap (\(a,b) -> distance a b m) cords
                            dmat = R.reshape (Z :. height  :. height ) dist
                         in R.computeS dmat

It seems to work. But then I added a QuickCheck:

prop_distmat :: Double -> Bool
prop_distmat d = let dvec = [d,0,0,0]
                     dmat = R.fromListUnboxed (Z :. (2::Int) :. (2::Int)) dvec
                     dist = buildDistanceMatrix dmat
                  in (R.toList dist) == [0.0, d, d, 0.0 ]

In other words, two points separated by a distance D should yield a distance matrix that looks like [0,D,D,0]. And in my adhoc manual testing, it does. But QuickCheck quickly found that a distance of 5.0e-324 yields a distance matrix of [0,0,0,0]

distance matrix  *** Failed! Falsifiable (after 2 tests and 1074 shrinks):    
5.0e-324

Is this just because of the precision of Doubles? Do I need to clamp the possible values that QuickCheck will send? Or is this a real bug?

Alexey Romanov · Accepted Answer · 2014-08-20T05:49:36.750

5

You are testing floating point numbers for equality, this should generally be avoided (in any language, this isn't Haskell-specific). You'll also get an overflow to infinity with large doubles. And sqrt (x*x) == x doesn't hold in general even for those doubles where you don't get overflow or underflow. So you need to both replace == with a check that difference is at most some reasonable epsilon and limit the possible values (or check for overflow in the property).

edited Aug 20 '14 at 05:49

answered Aug 20 '14 at 05:34

Alexey Romanov

167,066
35
309
487

Overflow in `sqrt (x*x) == x` aka intermediate overflow, where `x` below max representable value, yet `x*x` overflows. – elm Aug 20 '14 at 05:48
Of course! thank you. Are there any clues you could share on how to clamp the values that QuickCheck sends? – nont Aug 20 '14 at 13:45
Two options: 1. add a precondition to the property, e.g. `x < sqrt maxBound ==> ...` 2. pass your own `Gen Double` instead of the one produced by `Arbitrary`. – Alexey Romanov Aug 20 '14 at 19:32
One does sometimes want to test for floating point equality, for example, in calculating the binomial pdf and the probability is floating point 0. – idontgetoutmuch Aug 21 '14 at 09:49
@DominicSteinitz That's why I said "generally". – Alexey Romanov Aug 21 '14 at 10:13
Thanks! I ended up using a Gen Double along with forALl – nont Aug 21 '14 at 13:32
now I just need to figure out what the appropriate range is – nont Aug 21 '14 at 13:32

score 1 · Answer 2 · answered Aug 21 '14 at 09:54

Calculating the L2 norm of a vector naively can give under or over flow long before the square root function is applied. I quote from someone who knows: "The most robust calculation of the two norm in Fortran had over 200 lines of code, but that was 25 years ago". I suggest searching for a Fortran implementation and then using the knowledge of what can go wrong apply this to your Haskell implementation. Numerics are tricky; the good news is that most problems have probably all been solved in Fortran about 50 years ago.

good to know. I'm kinda new to numerics - I didn't even know that what I was doing was called "the L2 norm" — nont, Aug 21 '14 at 13:40

haskell double precision underflow with repa

2 Answers2