4

The current Arbitrary instance used to test Data.Set is too complicated for my taste. I don't really understand it, so I don't really trust it. I came up with the idea of separating the shape generation from the value generation.

Using

class Monad m => MonadGen m where
  liftGen :: Gen a -> m a
instance MonadGen Gen where
  liftGen = id
instance MonadGen m => MonadGen (StateT s m) where
  liftGen = lift . liftGen

I can write

mkArb :: MonadGen m => m a -> Int -> m (Set a)
mkArb step n
  | n <= 0 = pure Tip
  | n == 1 = singleton <$> step
  | n == 2 = do
     dir <- liftGen arbitrary
     p <- step
     q <- step
     if dir
       then pure (Bin 2 q (singleton p) Tip)
       else pure (Bin 2 p Tip (singleton q))
  | otherwise = do
      let upper = (3*(n - 1)) `quot` 4
      let lower = (n + 2) `quot` 4
      ln <- liftGen $ choose (lower, upper)
      let rn = n - ln - 1
      (\lt x rt -> Bin n x lt rt) <$> mkArb step ln <*> step <*> mkArb step rn

Then I can use something like StateT s Gen to populate the set with strictly increasing elements.

I have two questions:

  1. Do I successfully generate all balanced tree shapes? How can I check?

  2. What would be a good way to fill in values? I want sets with dense areas and with sparse areas. When I generate two sets I want them to overlap some times but not others, and their ranges to overlap some times but not others. I don't really have a good sense of how to accomplish these goals.

dfeuer
  • 48,079
  • 5
  • 63
  • 167
  • You can generate occasionally overlapping sets by taking random subsets of a given set - the larger the subsets, the larger the probability of overlap. I think the same would apply to the ranges of the sets. I'm not sure what dense and sparse areas of sets are. – user2407038 Aug 04 '16 at 07:07
  • If you want to check the implementation of Set I would have thought the best option would be to generate Arbitrary instances for its internal data structures. – Paul Johnson Aug 04 '16 at 09:10
  • By "balanced tree shapes" it appears you mean satisfying the [__balanced__ predicate](https://github.com/haskell/containers/blob/45bfe23ba21bb43b8c48ca8600c3becf5284cc1c/Data/Set/Base.hs#L1539)? – ErikR Aug 04 '16 at 16:49
  • @ErikR, yes, that's what I mean. – dfeuer Aug 04 '16 at 17:02
  • @PaulJohnson, that's pretty close to what I'm doing here. For each tree size, I'm randomly choosing subtree sizes that satisfy the balance conditions. But the actual values must be strictly increasing, so I can't wing that as much. One option would be to generate arbitrary lists, sort them, remove duplicates, etc. Another option would be to try for something more Brownian-style picking random gaps. I don't know what's best. – dfeuer Aug 04 '16 at 23:16
  • @user2407038, by dense and sparse I mean smaller and larger gaps between consecutive elements. These modify the probability of collision on insertion, union, intersection, difference, etc. Your random subsets idea is interesting; it may make sense to consider writing some of the set-combination tests to use `mkArb` to generate related sets rather than just grabbing two `arbitrary` ones. If I use that idea, how would you like to be credited? – dfeuer Aug 04 '16 at 23:22
  • @dfeuer If you want sparse and dense sets, pass the sequence [x0, x1..] for `step`, where the difference between xi and x(i+1) is a random number between 1 and m. When m is close to 1, you get dense sets, and when it is large, you get sparse sets. (Credit? It seems to me if you use this idea, you are doing most of the work. Ideas themselves are a dime a dozen - implementation is hard) – user2407038 Aug 04 '16 at 23:57
  • @dfeuer You do know that there is an "orderedList" generator in Test.QuickCheck.Arbitrary? – Paul Johnson Aug 05 '16 at 08:01
  • @PaulJohnson, yes, I do. It does not ensure the lists are *strictly* increasing, which I need. I also don't know what their distribution looks like at all. Do you know how I could find out? – dfeuer Aug 05 '16 at 13:21

0 Answers0