1

I've written some code calculating a distance matrix using repa:

distance :: Int -> Int -> Mat -> Double
distance aindx bindx arr = let a = slice arr (Any :. aindx :. All)
                               b = slice arr (Any :. bindx :. All)-
                               sqdiff = R.map (\x -> x*x) $ R.zipWith (-) a b 
                            in sqrt $ sumAllS sqdiff

buildDistanceMatrix :: Mat -> Mat 
buildDistanceMatrix m = let (Z :. height :. width) = R.extent m
                            cords = fromListUnboxed (Z :. (height * height) )  [ (x,y) | x <- [0..height-1], y <- [0..height-1]]
                            dist = R.smap (\(a,b) -> distance a b m) cords
                            dmat = R.reshape (Z :. height  :. height ) dist
                         in R.computeS dmat

It seems to work. But then I added a QuickCheck:

prop_distmat :: Double -> Bool
prop_distmat d = let dvec = [d,0,0,0]
                     dmat = R.fromListUnboxed (Z :. (2::Int) :. (2::Int)) dvec
                     dist = buildDistanceMatrix dmat
                  in (R.toList dist) == [0.0, d, d, 0.0 ]

In other words, two points separated by a distance D should yield a distance matrix that looks like [0,D,D,0]. And in my adhoc manual testing, it does. But QuickCheck quickly found that a distance of 5.0e-324 yields a distance matrix of [0,0,0,0]

distance matrix  *** Failed! Falsifiable (after 2 tests and 1074 shrinks):    
5.0e-324

Is this just because of the precision of Doubles? Do I need to clamp the possible values that QuickCheck will send? Or is this a real bug?

nont
  • 9,322
  • 7
  • 62
  • 82

2 Answers2

5

You are testing floating point numbers for equality, this should generally be avoided (in any language, this isn't Haskell-specific). You'll also get an overflow to infinity with large doubles. And sqrt (x*x) == x doesn't hold in general even for those doubles where you don't get overflow or underflow. So you need to both replace == with a check that difference is at most some reasonable epsilon and limit the possible values (or check for overflow in the property).

Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487
1

Calculating the L2 norm of a vector naively can give under or over flow long before the square root function is applied. I quote from someone who knows: "The most robust calculation of the two norm in Fortran had over 200 lines of code, but that was 25 years ago". I suggest searching for a Fortran implementation and then using the knowledge of what can go wrong apply this to your Haskell implementation. Numerics are tricky; the good news is that most problems have probably all been solved in Fortran about 50 years ago.

idontgetoutmuch
  • 1,621
  • 11
  • 17
  • good to know. I'm kinda new to numerics - I didn't even know that what I was doing was called "the L2 norm" – nont Aug 21 '14 at 13:40