0

I am learning Haskell and recently worked on a small project to mimic arithmetics in the way how human does it, ultimately I can keep as many decimal places as I want and computation results are not subject to the float precision issue. I also tried to use QuickCheck to test against my program. The issue I have is that when I run QuickCheck, sometimes it can get very slow, takes very long to run and uses up a lot of CPU power.

The program is really long now so I will try to pick the most relevant lines and rearrange there down below:

data D 
  = D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9
  deriving (Enum, Bounded, Eq, Ord, Show)

newtype N = N { getDs :: [D] }

nFromList :: [D] -> N
-- drops leading zeros and make an N value

instance Enum N where
  -- implementation omitted here --

instance Eq N where
  -- implementation omitted here --

instance Ord N where
  -- implementation omitted here --

instance Num N where 
  -- implementation omitted here --

-- this is a fraction data, numerator/denominator/sign
data F = F N N Bool 

getDenom :: F -> N 
getDenom (F _ d _) = d 
-- I can make data F a record but this was written long ago. 
-- And the test case for this runs slowly. 

instance Arbitrary D where
  arbitrary = elements [minBound .. maxBound]

instance Arbitrary N where
  arbitrary = do
    l <- getSize   ---------- impl#1 fast QuickCheck run
    -- l <- choose (0, 10) :: Get Int --------- impl#2 slow QuickCheck run
    ds <- replicateM l arbitrary
    return $ nFromList ds

instance Arbitrary F where
  arbitrary = do
    num <- arbitrary
    denom <- arbitrary
    sign <- arbitrary
    let denom' = max denom 1
    return $ constructF num denom' sign

constructF :: N -> N -> Bool -> F
-- it does some validation on denominator and also reduce the fraction etc.

prop_PosDenom :: F -> Bool
prop_PosDenom f = (getDenom f) > 0

main = do 
  quickCheck prop_PosDenom

The issue is, in instance Aribtrary N definition, if I use impl#1, everything works out great. The test cases take less than 1 second to run. But if I use impl#2, it would get randomly hung up at different point and it would take very long time to run, with 100% CPU usage for that core.

I tried to use Debug.Trace and uses it everywhere but it doesn't look like it is somewhere in my code. It appears that somehow quickCheck generates two arbitrary F values but there is delay to call prop_PosDenom.

Thank you in advance for any suggestions.

dhu
  • 718
  • 6
  • 19
  • 2
    Consider using the `Arbitrary` instance for `[D]` to construct your `N` instances. It's possible that your implementation is simply producing offensively large numbers that take forever to compute. – Silvio Mayolo Jun 30 '21 at 22:52
  • The fast case is where N can go arbitrarily large and the slow case is where N is limited to 10 digits. I can confirm that when I print with `trace` – dhu Jun 30 '21 at 22:55
  • But let me try `arbitrary` for [D] – dhu Jun 30 '21 at 22:59
  • Strictly speaking `getSize` can become arbitrarily large, but practically speaking quickcheck will try all the small sizes before it tries all the bigger sizes. Chances are that with `getSize` you are getting small numbers (try `verboseCheck` to see the test cases it actually is generating). Also try comparing e.g. `choose (0, 2)` and `choose (0, 4)` and `choose (0, 6)` to see how performance degrades with increasing size (I am betting it degrades pretty quickly) – user2407038 Jun 30 '21 at 23:31
  • 1
    Put it another way, try to call `prop_PosDenom` on a `F` whose numerator and denominator both have 10 (random) digits. How long does this take? – user2407038 Jun 30 '21 at 23:34
  • Hmm...looks like it is actually my implementation that is costly. Some short/small inputs does trigger some crazy computations that very large numbers don't trigger. – dhu Jul 01 '21 at 00:56
  • Do you know about [`Integer`](https://hackage.haskell.org/package/base-4.15.0.0/docs/Prelude.html#t:Integer) and [`Rational`](https://hackage.haskell.org/package/base-4.15.0.0/docs/Prelude.html#t:Rational)? – Daniel Wagner Jul 01 '21 at 19:03
  • Yes I do. I wanted to do this way as an exercise. For some reason I found it’s fascinating to program these out in Haskell. Maybe once I get this way done, I want to try using eBay you suggested. – dhu Jul 03 '21 at 04:15

1 Answers1

3

I found out the reason why it was slow. It was my code. There was no reason to doubt QuickCheck.

Before I started looking into more details as a lot of the things were written a while ago, I assumed that simpler/shorter inputs would run faster than larger/longer inputs. That's why I thought getSize could end up with longer run time. In fact, in my code it is the short numbers that trigger some offending computations.

I had to define a gcd function for type N which is used for fraction reduction. I defined it using Euclid's Algorithm, as follows:

gcdN :: N -> N -> N 
gcdN n m
  | n == 0 || m == 0 = 0
  | n == m = n
  | n > m = gcdN (subN n m) n
  | otherwise = gcdN (subN m n) n

This algorithm is slow when n >> m or m >> n, which is exactly the case. One of my sample test

gcdN (N [D4]) (N [D2, D8, D8, D7, D4, D7, D6, D8, D1, D2])

took forever.

After I replaced it with Euclidean Algorithm, this returns instantly.

gcdN :: N -> N -> N
gcdN n m
  | n == 0 = m
  | m == 0 = n
  | m == n = n
  | n > m = case divModN n m of --using my divModN :: N -> N -> Maybe (N, N)
    Nothing -> n -- when m is 0
    Just (q, r) -> gcdN m r
  | otherwise = gcdN m n

I think I gained confidence in QuickCheck tremendously from this.

Greatest Common Divisor wiki page

dhu
  • 718
  • 6
  • 19