5

ScalaCheck's Gen API docs explain lazy val sized:

def sized[T](f: (Int) ⇒ Gen[T]): Gen[T]

Creates a generator that can access its generation size

Looking at the following example:

import org.scalacheck.Gen.{sized, posNum}

scala> sized( s => { println(s); posNum[Int] })
res12: org.scalacheck.Gen[Int] = org.scalacheck.Gen$$anon$3@6a071cc0

scala> res12.sample
100
res13: Option[Int] = Some(12)

scala> res12.sample
100
res14: Option[Int] = Some(40)

What is the meaning of generation size, namely 100 in the above output?

Kevin Meredith
  • 41,036
  • 63
  • 209
  • 384

2 Answers2

13

sized provides access to the "size" parameter of Scalacheck. This parameter indicates how "big" the values generated by a generator should be. This parameter is useful in a couple of situations:

  • You want to limit the amount of values generated to make property generation and thus test run faster.
  • You need to fit generated data into external constraints, like form validators which check the length of a string, or databases which put limits on columns.
  • You need to generate recursive data structures and terminate at some point.

The companion of Gen.sized is Gen.resize which lets you change the size of a generator, as in Gen.resize(10, Gen.alphaNumString) which will generate a alpha-numeric string of no more than ten characters.

Most built-in generators use sized in some way, for instance Gen.buildableOf (which is the underpinning of all generators for lists and containers):

[…] The size of the container is bounded by the size parameter used when generating values.

A simple example

To get an idea of how Gen.size is used take a look at the example in "Sized Generators" (Generators, Scalacheck user guide):

def matrix[T](g: Gen[T]): Gen[Seq[Seq[T]]] = Gen.sized { size =>
  val side = scala.math.sqrt(size).asInstanceOf[Int]
  Gen.listOfN(side, Gen.listOfN(side, g))
}

This generator uses the "size" to limit the dimensions of the matrix so that the entire matrix will never have more than entries than the "size" parameter. In other words, with a size of 100 as in your question, the generated matrix would have 10 rows and 10 columns, amounting to a total of 100 entries.

Recursive data structures

"size" is particularly useful to make sure that generators for recursive data structures terminate. Consider the following example which generates instances of a binary tree and uses size to limit the height of each branch to make sure that the generator terminates at some point:

import org.scalacheck.Gen
import org.scalacheck.Arbitrary.arbitrary

sealed abstract class Tree
case class Node(left: Tree, right: Tree, v: Int) extends Tree
case object Leaf extends Tree

val genLeaf = Gen.const(Leaf)
val genNode = for {
  v <- arbitrary[Int]
  left <- Gen.sized(h => Gen.resize(h/2, genTree))
  right <- Gen.sized(h => Gen.resize(h/2, genTree))
} yield Node(left, right, v)

def genTree: Gen[Tree] = Gen.sized { height =>
  if (height <= 0) {
    genLeaf
  } else {
    Gen.oneOf(genLeaf, genNode)
  }
}

Note how the generator for nodes recursively generates trees, but allows them only half of the "size". The tree generator in turn will only generate leafs if its size is exhausted. Thus the "size" of the generator is an upper bound for the height of the generated tree, ensuring that the generator terminates at some point and does not generate excessively large trees.

Note that the size only sets an upper bound for the height of the tree in this example. It does not affect the balancing of the generated tree or the likeliness of generating a tree with a certain depth. These depend solely on the bias defined in genTree.

With oneOf each subtree has a 50% chance of being just a Leaf, ending the growth of the tree at this branch, which makes generating a complete tree that exhausts the "whole" size somewhat unlikely.

frequency (see below) let's you encode a different bias. In the example below nodes a way more likely than leafs, so the tree generated by the generator below is more likely to grow, but it's still unlikely to be complete.

Relation to Gen.frequency

Gen.frequency is for a different use case: You would not use it to limit the depths or size of a data structure, but to add a certain bias to a choice of generators. Take a look at the definition of Gen.option:

def option[T](g: Gen[T]): Gen[Option[T]] =
  frequency(1 -> const(None), 9 -> some(g))

This definition uses frequency to make the less-interesting case of None less likely than the more interesting case of Some.

In fact, we could combine Gen.sized and Gen.frequency in our binary tree example above to make genTree more likely to generate "interesting" nodes rather than "boring" leafs:

def genTree: Gen[Tree] = Gen.sized { height =>
  if (height <= 0) {
    genLeaf
  } else {
    Gen.frequency(1 -> genLeaf, 9 -> genNode)
  }
}
ashawley
  • 4,195
  • 1
  • 27
  • 40
  • Thanks for the detailed answer. So using Gen.frequency to try to achieve termination of a generator for a recursive data structure is a bit of a hack, but likely to work? – Kevin Meredith Mar 17 '17 at 12:10
  • Thanks for that update - it cleared up my misunderstanding! With respect to using `Gen.resize( n / 2, ...)`, won't that mean that most of the generated data has the same depth. In other words, if my initial `size` is 100, then each test recursive structure will consist of depth 8: `100 -> 50 -> 25 -> 12 -> 6 -> 3 -> 1 -> 0`, no? Put differently, when the `height <= 0`, then `genLeaf`'s return means that the tree has completed, i.e. no more branching. – Kevin Meredith Mar 19 '17 at 02:53
  • Also, rather than use the idiomatic 'sized', we could've passed a 'counter' int value, terminating at <= 0, no? – Kevin Meredith Mar 19 '17 at 13:08
  • 1
    @KevinMeredith I edited my question again. In short, the size is the upper bound for the height, but does not affect the likeliness of nodes or leafs. These are solely controlled by the bias in `genTree`, i.e. whether you use oneOf or frequency, and what parameters for frequency. As for the counter, yes you could do that, and you would, if you would like to control just the height of the tree, but not affect generators for the data contained in a node. –  Mar 20 '17 at 11:10
1

Generation size is the number of results the generator will produce. The sized method simply lets you write generators that know their own size so you can use that information as a factor in what you generate.

For example, this generator (from this resource) produces two lists of numbers where 1/3 are positive and 2/3 are negative:

import org.scalacheck.Gen
import org.scalacheck.Prop.forAll

val myGen = Gen.sized { size =>
  val positiveNumbers = size / 3
  val negativeNumbers = size * 2 / 3
  for {
    posNumList <- Gen.listOfN(positiveNumbers, Gen.posNum[Int])
    negNumList <- Gen.listOfN(negativeNumbers, Gen.posNum[Int] map (n => -n))
  } yield (size, posNumList, negNumList)
}

check {
  forAll(myGen) {
    case (genSize, posN, negN) =>
     posN.length == genSize / 3 && negN.length == genSize * 2 / 3
  }
}

So sort of like zipWithIndex in Scala collections, sized just provides you meta information to help you do what you need to do.

Vidya
  • 29,932
  • 7
  • 42
  • 70