15

I have a Haskell data type like

data Mytype
  = C1
  | C2 Char
  | C3 Int String

If I case on a Mytype and forget to handle one of the cases, GHC gives me a warning (exhaustiveness check).

I now want to write a QuickCheck Arbitrary instance to generate MyTypes like:

instance Arbitrary Mytype where
  arbitrary = do
    n <- choose (1, 3 :: Int)
    case n of
      1 -> C1
      2 -> C2 <$> arbitrary
      3 -> C3 <$> arbitrary <*> someCustomGen

The problem with this is that I can add a new alternative to Mytype and forget to update the Arbitrary instance, thus having my tests not test that alternative.

I would like to find a way of using GHC's exhaustiveness checker to remind me of forgotten cases in my Arbitrary instance.

The best I've come up with is

arbitrary = do
  x <- elements [C1, C2 undefined, C3 undefined undefined]
  case x of
    C1     -> C1
    C2 _   -> C2 <$> arbitrary
    C3 _ _ -> C3 <$> arbitrary <*> someCustomGen

But it doesn't really feel elegant.

I intuitively feel that there's no 100% clean solution to this, but would appreciate anything that reduces the chance of forgetting such cases - especially in a big project where code and tests are separated.

nh2
  • 24,526
  • 11
  • 79
  • 128
  • 2
    Just a note: One can write `C2{}` instead of `C2 _` and so on which at least makes the syntax a bit nicer. – nh2 Aug 28 '14 at 19:59
  • 2
    Note that the `undefined` thing will fail if the constructor is strict. – Lambda Fairy Sep 04 '14 at 08:23
  • Is there some reason you don't want to just derive the Arbitrary instance automatically with TH? – Dan Burton Sep 04 '14 at 18:52
  • @DanBurton Yes, I wanted to hint at that with my use of `someCustomGen`. If I have some invariant not captured by the type system, or otherwise want to influence with what probability alternatives are generated (to make it more likely to generate non-trivial cases), I have to customise parts of the Arbitrary instance (e.g. with QuickCheck's `frequency` function). – nh2 Sep 04 '14 at 21:26
  • Great question! I have opened https://github.com/nick8325/quickcheck/issues/267 to ask what the recommended solution is today. – sjakobi Aug 31 '19 at 15:16

4 Answers4

2

I implemented a solution with TemplateHaskell, you can find a prototype at https://gist.github.com/nh2/d982e2ca4280a03364a8. With this you can write:

instance Arbitrary Mytype where
  arbitrary = oneof $(exhaustivenessCheck ''Mytype [|
      [ pure C1
      , C2 <$> arbitrary
      , C3 <$> arbitrary <*> arbitrary
      ]
    |])

It works like this: You give it a type name (like ''Mytype) and an expression (in my case a list of arbitrary style Gens). It gets the list of all constructors for that type name and checks whether the expression contains all of these constructors at least once. If you just added a constructor but forgot to add it to the Arbitrary instance, this function will warn you at compile time.

This is how it's implemented with TH:

exhaustivenessCheck :: Name -> Q Exp -> Q Exp
exhaustivenessCheck tyName qList = do
  tyInfo <- reify tyName
  let conNames = case tyInfo of
        TyConI (DataD _cxt _name _tyVarBndrs cons _derives) -> map conNameOf cons
        _ -> fail "exhaustivenessCheck: Can only handle simple data declarations"

  list <- qList
  case list of
    input@(ListE l) -> do
      -- We could be more specific by searching for `ConE`s in `l`
      let cons = toListOf tinplate l :: [Name]
      case filter (`notElem` cons) conNames of
        [] -> return input
        missings -> fail $ "exhaustivenessCheck: missing case: " ++ show missings
    _ -> fail "exhaustivenessCheck: argument must be a list"

I'm using GHC.Generics to easily traverse the syntax tree of the Exp: With toListOf tinplate exp :: [Name] (from lens) I can easily find all Names in the whole exp.

I was surprised that the types from Language.Haskell.TH do not have Generic instances, and neither (with current GHC 7.8) do Integer or Word8 - Generic instances for these are required because they appear in Exp. So I added them as orphan instances (for most things, StandaloneDeriving does it but for primitive types like Integer I had to copy-paste instances as Int has them).

The solution is not perfect because it doesn't use the exhaustiveness checker like case does, but as we agree, that's not possible while staying DRY, and this TH solution is DRY.

One possible improvement/alternative would be to write a TH function that does this check for all Arbitrary instances in a whole module at once instead of calling exhaustivenessCheck inside each Arbitrary instance.

nh2
  • 24,526
  • 11
  • 79
  • 128
2

You want to ensure that your code behaves in a particular way; the simplest way to check the behaviour of code is to test it.

In this case, the desired behaviour is that each constructor gets reasonable coverage in tests. We can check that with a simple test:

allCons xs = length xs > 100 ==> length constructors == 3
             where constructors = nubBy eqCons xs
                   eqCons  C1       C1      = True
                   eqCons  C1       _       = False
                   eqCons (C2 _)   (C2 _)   = True
                   eqCons (C2 _)    _       = False
                   eqCons (C3 _ _) (C3 _ _) = True
                   eqCons (C3 _ _)  _       = False

This is pretty naive, but it's a good first shot. Its advantages:

  • eqCons will trigger an exhaustiveness warning if new constructors are added, which is what you want
  • It checks that your instance is handling all constructors, which is what you want
  • It also checks that all constructors are actually generated with some useful probability (in this case at least 1%)
  • It also checks that your instance is usable, eg. doesn't hang

Its disadvantages:

  • Requires a large amount of test data, in order to filter out those with length > 100
  • eqCons is quite verbose, since a catch-all eqCons _ _ = False would bypass the exhaustiveness check
  • Uses magic numbers 100 and 3
  • Not very generic

There are ways to improve this, eg. we can compute the constructors using the Data.Data module:

allCons xs = sufficient ==> length constructors == consCount
             where sufficient   = length xs > 100 * consCount
                   constructors = length . nub . map toConstr $ xs
                   consCount    = dataTypeConstrs (head xs)

This loses the compile-time exhaustiveness check, but it's redundant as long as we test regularly and our code has become more generic.

If we really want the exhaustiveness check, there are a few places where we could shoe-horn it back in:

allCons xs = sufficient ==> length constructors == consCount
             where sufficient   = length xs > 100 * consCount
                   constructors = length . nub . map toConstr $ xs
                   consCount    = length . dataTypeConstrs $ case head xs of
                                                                  x@(C1)     -> x
                                                                  x@(C2 _)   -> x
                                                                  x@(C3 _ _) -> x

Notice that we use consCount to eliminate the magic 3 completely. The magic 100 (which determined the minimum required frequency of a constructor) now scales with consCount, but that just requires even more test data!

We can solve that quite easily using a newtype:

consCount = length (dataTypeConstrs C1)

newtype MyTypeList = MTL [MyType] deriving (Eq,Show)

instance Arbitrary MyTypeList where
  arbitrary = MTL <$> vectorOf (100 * consCount) arbitrary
  shrink (MTL xs) = MTL (shrink <$> xs)

allCons (MTL xs) = length constructors == consCount
                   where constructors = length . nub . map toConstr $ xs

We can put a simple exhaustiveness check in there somewhere if we like, eg.

instance Arbitrary MyTypeList where
  arbitrary = do x <- arbitrary
                 MTL <$> vectorOf (100 * consCount) getT
              where getT = do x <- arbitrary
                              return $ case x of
                                            C1     -> x
                                            C2 _   -> x
                                            C3 _ _ -> x
  shrink (MTL xs) = MTL (shrink <$> xs)
Warbo
  • 2,611
  • 1
  • 29
  • 23
1

Here I exploit an unused variable _x. This is not really more elegant than your solution, though.

instance Arbitrary Mytype where
  arbitrary = do
    let _x = case _x of C1 -> _x ; C2 _ -> _x ; C3 _ _ -> _x
    n <- choose (1, 3 :: Int)
    case n of
      1 -> C1
      2 -> C2 <$> arbitrary
      3 -> C3 <$> arbitrary <*> someCustomGen

Of course, one has to keep the last case coherent with the dummy definition of _x, so it is not completely DRY.

Alternatively, one might exploit Template Haskell to build a compile-time assert checking that the constructors in Data.Data.dataTypeOf are the expected ones. This assert has to be kept coherent with the Arbitrary instance, so this is not completely DRY either.

If you do not need custom generators, I believe Data.Data can be exploited to generate Arbitrary instances via Template Haskell (I think I saw some code doing exactly that, but I can't remember where). In this way, there's no chance the instance can miss a constructor.

chi
  • 111,837
  • 3
  • 133
  • 218
  • Another possibility is to use `GHC.Generics` to derive the arbitrary instance. `GHC.Generics` is very well suited in cases where you can say what to do for Sums (Constructors of a data type) and Products (Fields of a data constructor), which this one should be an instance of. – bennofs Aug 28 '14 at 19:37
  • @bennofs Unfortunately `GHC.Generics` only help if you want to use the default `arbitrary` in all fields. In that case, yes, they are great for this purpose, but in my case it is important that I can customise the instance (I tried hinting at that by including `someCustomGen`). – nh2 Aug 28 '14 at 19:56
  • The self-recursive way you propose is certainly nicer than my `undefined`s. – nh2 Aug 28 '14 at 20:01
  • If I add a `C4` constructor how is the above code going to catch the missing case? – Daniel Aug 29 '14 at 17:10
  • @DanielVelkov It will emit a warning "Non exhaustive patterns", like the code posted by nh2 in his question. – chi Aug 29 '14 at 17:21
  • @chi and what if I update the `case _x` part but not the second part of the function which generates the random values? – Daniel Aug 29 '14 at 17:22
  • @DanielVelkov This is indeed a shortcoming. That's what I meant when I wrote that the case must be kept coherent with _x, so this is not completely DRY, nor much more elegant than the original nh2's code. It does not force the instance to be correct, but it helps the programmer in remembering to fix the code for new constructors. – chi Aug 29 '14 at 17:29
0

Here is a solution using the generic-random library:

{-# language DeriveGeneric #-}
{-# language TypeOperators #-}

import Generic.Random
import GHC.Generics
import Test.QuickCheck

data Mytype
  = C1
  | C2 Char
  | C3 Int String
  deriving Generic

instance Arbitrary Mytype where
  arbitrary = genericArbitraryG customGens uniform
    where
      customGens :: Gen String :+ ()
      customGens = someCustomGen :+ ()

someCustomGen :: Gen String
someCustomGen = undefined

genericArbitraryG takes care of generating each constructor of MyType. In this case we use uniform to get a uniform distribution of constructors. With customGens we define that each String field in Mytype is generated with someCustomGen.

See Generic.Random.Tutorial for more examples.

sjakobi
  • 3,546
  • 1
  • 25
  • 43