1

I am trying to develop property-based tests for a matching algorithm and I need to generate two inputs sets of the same size to feed into the algorithm. My current attempt at a solution is the following.

case class Man(id: Long, quality: Long, ordering: Ordering[Woman])
case class Woman(id: Long, quality: Long, ordering: Ordering[Man])

val man: Gen[Man] = {
  for {
    id <- Gen.posNum[Long]
    quality <- Gen.posNum[Long]
  } yield Man(id, quality, Man.womanByQuality)
}

val woman: Gen[Woman] = {
  for {
    id <- Gen.posNum[Long]
    quality <- Gen.posNum[Long]
  } yield Woman(id, quality, Woman.manByQuality)
}  

def setOfN[T](n: Int, g: Gen[T]): Gen[Set[T]] = {
  Gen.containerOfN[Set, T](n, g)
}

def unMatched: Gen[(Set[Man], Set[Woman])] = Gen.sized {
  n => setOfN(n, man).flatMap(ms => setOfN(ms.size, woman).map(ws => (ms, ws)))
}

This generates tuples of input sets as required, but they are not guaranteed to be the same size. When I run the test using...

property("all men and women are matched") = forAll(unMatched) {
  case (ms, ws) =>
    println((ms.size, ws.size))
    val matches = DeferredAcceptance.weaklyStableMatching(ms, ws)
    (matches.size == ms.size) && (matches.size == ws.size)
}

The console will print something like...

(0,0)
(1,1)
(2,2)
(3,2)
(1,2)
(0,2)
(0,1)
(0,0)
! marriage-market.all men and women are matched: Exception raised on proper
  ty evaluation.
> ARG_0: (Set(),Set(Woman(1,1,scala.math.Ordering$$anon$10@3d8314f0)))
> ARG_0_ORIGINAL: (Set(Man(3,1,scala.math.Ordering$$anon$10@2bea5ab4), Man(
  2,1,scala.math.Ordering$$anon$10@2bea5ab4), Man(2,3,scala.math.Ordering$$
  anon$10@2bea5ab4)),Set(Woman(1,1,scala.math.Ordering$$anon$10@3d8314f0), 
  Woman(3,2,scala.math.Ordering$$anon$10@3d8314f0)))
> Exception: java.lang.IllegalArgumentException: requirement failed
scala.Predef$.require(Predef.scala:264)
org.economicsl.matching.DeferredAcceptance$.weaklyStableMatching(DeferredAc
  ceptance.scala:97)
org.economicsl.matching.MarriageMarketSpecification$.$anonfun$new$2(Marriag
  eMarketSpecification.scala:54)
org.economicsl.matching.MarriageMarketSpecification$.$anonfun$new$2$adapted
  (MarriageMarketSpecification.scala:51)
org.scalacheck.Prop$.$anonfun$forAllShrink$2(Prop.scala:761)
Found 1 failing properties.

Process finished with exit code 1

The test fails because I have included a requirement that the two input sets must be of equal size. My intent is that the generator should supply valid input data.

Thoughts?

davidrpugh
  • 4,363
  • 5
  • 32
  • 46
  • Is the core of your problem that you're not able to generate a pair of collections with the same size, or that the generated items may not be unique and hence the size of a Set (which removes duplicates) decreases? – FlorianK Apr 29 '18 at 13:55
  • The core of my problem is that I am not able to generate a pair of collections of the same size. My case classes for `Man` and `Woman` have a field for a unique `id: Long` and my `Gen[Man]` and `Gen[Woman]` are defined such that should always generate a distinct values when sampled. – davidrpugh Apr 30 '18 at 02:19

4 Answers4

1

Problem: Construct a generator of type Gen[(Set[T],Set[U])] such that for each generated pair of sets, each set in the pair has the same size.

The following function

import org.scalacheck.Gen
def genSameSizeSets[T,U](gt: Gen[T], gu: Gen[U]): Gen[(Set[T],Set[U])] = {
  for { n      <- Gen.posNum[Long] // or .oneOf(1 to MAX_SET_SIZE)
        tset   <- Gen.containerOfN[Set,T](n, gt)
        uset   <- Gen.containerOfN[Set,U](n, gu)
        minsize = Math.min(tset.size, uset.size)
  } yield (tset.take(minsize), uset.take(minsize))
}

constructs the desired generator.

A key point about this generator is that it completely avoids discarding candidates.

containerOfN by itself can't guarantee the size of the resulting Set since that would require gt and gu to generate n consecutive distinct values.

An alternative implementation would have been to put a guard if clause in the for comprehension

if tset.size == uset.size

That was may first attempt. It was not a robust generator because it had a high discard ratio and ScalaCheck gave up before passing.

In this case, there is an easy way out. Rather than discard mismatched candidates, just coerce the larger to the same size as the smaller (which is still non-empty). Since the set values are arbitrary, it doesn't matter which are discarded. This logic is implemented with Math.min and take.

This seems to be an important principle of good generator design: "avoid discards like the plague".

Here's a complete working example:

import org.scalacheck.Properties
import org.scalacheck.Gen
import org.scalacheck.Arbitrary
import org.scalacheck.Prop.{forAll,collect}

object StackOverflowExample extends Properties("same size sets") {

  def genSameSizeSets[T,U](gt: Gen[T], gu: Gen[U]): Gen[(Set[T],Set[U])] = {
    for { n <- Gen.posNum[Int]
          ts <- Gen.containerOfN[Set,T](n, gt)
          us <- Gen.containerOfN[Set,U](n, gu)
          if us.size == ts.size
          minsize = Math.min(ts.size, us.size)
    } yield (ts.take(minsize), us.take(minsize))
  }

  val g = genSameSizeSets(Arbitrary.arbitrary[Int], Arbitrary.arbitrary[Char])

  property("same size")  = forAll(g) { case (intSet, charSet) =>
    collect(intSet.size, charSet.size) { intSet.size == charSet.size }
  }


}

with this output

+ same size sets.same size: OK, passed 100 tests.
> Collected test data: 
8% (11,11)
7% (2,2)
7% (17,17)
6% (16,16)
<snip>
1% (44,44)
1% (27,27)
1% (26,26)
1% (56,56)
Paul
  • 66
  • 4
0

I have stumbled across the following solution.

def unMatched: Gen[(Set[Man], Set[Woman])] = Gen.sized {
  n => setOfN(n, man).flatMap(ms => setOfN(ms.size, woman).map(ws => (ms, ws))).suchThat { case (ms, ws) => ms.size == ws.size }
}

But I don't think it should be necessary to use the suchThat combinator. The issue seems to be that the size parameter is treated as an upper bound for size of the container (rather than an equality constraint).

Updated based on comments from @FlorianK

I discovered that the problem was with my specification of the Man and Woman generators. These generators were not generator distinct values. Instead of using a positive Long to represent the unique id I switched to using a Java UUID. Correct generators are

val man: Gen[Man] = {
  for {
    id <- Gen.uuid
    quality <- Gen.posNum[Long]
  } yield Man(id, quality, Man.womanByQuality)
}

val woman: Gen[Woman] = {
  for {
    id <- Gen.uuid
    quality <- Gen.posNum[Long]
  } yield Woman(id, quality, Woman.manByQuality)
}

I am not quite sure why the original generators did not work as expected. It was certainly possible for them to generate non-unique instances but I thought it should have been exceedingly rare (guess I was wrong!).

davidrpugh
  • 4,363
  • 5
  • 32
  • 46
0

I managed to generate pairs of List[Int] of equal size with the following code:

val pairOfListGen = Gen.sized { size => for {
    x <- Gen.containerOfN[List, Int](size, Gen.choose(0,50000))
    y <- Gen.containerOfN[List, Int](size, Gen.choose(0,50000))
  } yield (x,y)
}

The Man.womanByQuality is not defined in your code sample, so I was not able to test it with your generators, but I hope this works for you.

FlorianK
  • 400
  • 2
  • 14
  • This doesn't seem to generate lists of the same size. When I use your generator and insert a `println` call to print the size of `x` and `y` as described above I get different sizes for `x` and `y`. – davidrpugh May 01 '18 at 06:12
  • That suprises me. I had tested with `property("test") = forAll(pairOfListGen) { case (x,y) => println(x.size, y.size); true}` and it lists pairs of the same numbers. Also when I remove the .size I see pairs of different lists with same sizes, exactly as I would expect. – FlorianK May 02 '18 at 08:37
0

Just answered related question from the crpyt here

def pairOfLists[T1, T2](t1g: Gen[T1], t2g: Gen[T2]): Gen[List[(T1, T2)]] = for {
  t1s <- Gen.listOf(t1g)
  t2s <- Gen.listOfN(t1s.size * 3, t2g).suchThat(_.size >= t1s.size)
} yield t1s.zip(t2s.take(t1s.size))

Note the suchThat will discard t2s candidates that are too small, which will cause the generators to fail sooner; the t1s.size * 3 attempts to mitigate that by increasing the possibility to make a t2s candidate that is larger than necessary, which is then cut-short with t2s.take(t1s.size)

Darren Bishop
  • 2,379
  • 23
  • 20