Type-directed test generation

Question

Haskell has the nice property that the type signature for a piece of code sometimes tells you what that code does. And that got me thinking... can you construct tests based on type signatures?

Consider, for example, a module such as Data.Map. It defines a data type and several functions which operate on that type. Looking only that the type signatures, it should be possible [in principle] to figure out all possible ways of constructing Map values. And by that I mean algorithmically possible - it ought to be possible to write some program that takes the interface to your module and figures out all possible function calls you could make against it.

Now, there are those who feel that a well-written test suite can be seen as the "specification" of what a library is supposed to do. Obviously the human that wrote the code knows what they're trying to do, and the machine doesn't. But the machine should be able to work out what it's possible ask for, given the type signatures that you've provided.

As a more specific case, suppose there is some set of invariants that are supposed to apply to all possible Map values. (E.g., null m == (size m == 0), or perhaps an internal sanity-check function should always return true.) One might imagine writing a test framework, handing it the module, and saying "a value of type Map X Y should always satisfy these invariants", and let the test framework go off and try to execute every possible combination of the functions you've provided to produce maps and check that they satisfy all the conditions. If not, it tells you what condition is violated and the expression needed to construct this invalid value.

My question is, does this sort of approach sound tractable? Desirable? Interesting? How would you approach solving a problem like this? Djinn seems to already know how to construct values of specified types given preexisting functions and constants, so trying to come up with multiple expressions having a given type doesn't sound too hard. What kind of heuristics might one apply to attempt to get good code coverage? (Obviously analysing code rather than just types is drastically harder... Rice's theorem lurks in wait.)

(Note that I'm not suggesting that this kind of machine checking should replace hand-written test code. The idea is more to augment it; perhaps in harder cases, the machine could cough up possible expressions, and then ask a human what the "right" answer ought to be. This can then be recorded as a new test case, to be run whenever the test suite runs.)

You'd need some sort of rules.. Much like the monad or lens laws, with those and the function signatures, Yes you can generate tests. Otherwise `Map.insert _ _ m = m` is a legal implementation — daniel gratzer, Oct 28 '13 at 20:07
@jozefg The standard classes are another interesting specific case - most if not all of these classes have special properties which are expected to hold, and that a human being could write down once and for all, and then have a machine check for any instance you throw at it. — MathematicalOrchid, Oct 28 '13 at 20:11
You should look into [quickspec](http://hackage.haskell.org/package/quickspec) which is a tool for discovering specifications in code. — Thomas M. DuBuisson, Oct 28 '13 at 20:15
You might also be interested in [this stackoverflow discussion](http://stackoverflow.com/questions/10205793/given-a-haskell-type-signature-is-it-possible-to-generate-the-code-automaticall). — kqr, Oct 28 '13 at 21:11
The more I look at QuickSpec, the more I think this is just about the best "answer" I am ever likely to get to such a vague, open-ended question... The paper was seriously interesting reading. — MathematicalOrchid, Oct 28 '13 at 21:28
[irulan](http://www.doc.ic.ac.uk/~tora/irulan/) seems exactly right, but it is bitrotted. — aavogt, Oct 30 '13 at 14:28
@aavogt That sounds extremely interesting... Another great comment worthy of being an answer! — MathematicalOrchid, Oct 31 '13 at 21:45

score 3 · Accepted Answer · answered Oct 28 '13 at 20:12

Probably the closest thing you can use right now is QuickCheck or SmallCheck---the "property-based" testing libraries.

The quintessential example of a QuickCheck-like test is reverse

>>> quickCheck $ \s -> s == reverse (reverse s)
True

QuickCheck uses the typeclass mechanisms to create random (called Arbitrary) examples of various types and then checks properties on these random instances. A properly chosen Arbitrary instance for Map can go a little ways toward creating the kinds of tests you're looking for.

There are a lot more options here on the horizon. If you examine some of the dependently typed languages like Idris and Agda you can get "Haskell-influenced" syntax atop a significantly more powerful type system which is able to statically prove properties of your programs. This is a far cry above QuickCheck (which is already a far cry above Unit testing) because it no longer depends upon the ability to create suitable Arbitrary instances which appropriate capture the problem space of interest.

In addition to a full dependently-typed language you might be interested in some of the type proving possible using Liquid Haskell.

Yes, this libraries are type-directed, and that they use types to decide what data needs to be generated to test a given property. _You_ still need to explain how to construct that data, though. A library to construct _expressions_ would be an interesting complement to that. (Or not?) — MathematicalOrchid, Oct 28 '13 at 20:14
Oh, I agree with that. The creation of "suitable" Arbitrary instances is by and far away the most challenging and deficient part of QuickCheck. It also assumes a certain "type-directedness" which can be very useful in some domains compared to an "expression-directedness" which might be more useful in others. The `pipes` ecosystem is a good example of a need for more expression oriented property testing. — J. Abrahamson, Oct 28 '13 at 20:16

score 1 · Answer 2 · answered Nov 26 '13 at 13:35

Expanding on the comment from "aavogt":

Irulan takes a set of identifiers, constructs all possible well-typed expressions involving those identifiers, and searches for expressions which throw an exception when executed.

If a function throws an exception, this is not necessarily a bug, of course; there may well be expressions which you're expecting to throw exceptions. But Irulan finds them all and shows them to you, so you can decide which ones are legit and which ones are bugs.

It appears Irulan won't find any bugs which don't result in an exception. (In particular, expressions which should throw an exception but don't will not be found.) Non-termination or incorrect results won't be caught, nor will excessive resource usage. (Then again, how would a machine automatically figure out what the "correct" result is supposed to be? Telepathy?)

The thing I find fascinating is the approach to test data generation. Irulan looks either for a type's value constructors, or if those aren't in scope, tries to find functions which produce values of the appropriate type. It doesn't use Eq instances at all, preferring to use case-blocks (or projection functions if it can't get at the constructors) to inspect values.

The loopy thing is how Irulan uses tricks with laziness to lazily generate test data. For example, if you're trying to test the length function, it doesn't matter one jot what data is actually in the list, only how big the list is. Irulan can automatically figure that out, so it will generate several different sizes of lists, but it doesn't bother putting any data into the lists. A framework like QuickCheck would uselessly generate hundreds of test cases with the same list size but different contents.

For example, Irulan can generate a list which contains three "holes". If you touch one of these holes, it throws an exception. However, the holes are numbered, and the exception has the hole number in it. Irulan catches the exception, and thus "knows" which hole you touched. It can then systematically replace that hole with every possible well-typed value that could go there (itself recursively filled with holes again). In this approach, the search tree is pruned to only what matters to the code actually under test.

I hadn't realise that exceptions and laziness interact in this way to allow one to "observe" the internal operation of what would otherwise be opaque code. I find that really interesting...

I should also point out that the Irulan PhD thesis contains a pretty exhaustive-looking survey of other work revelant to this question.

score 0 · Answer 3 · answered Nov 26 '13 at 13:23

Expanding on the comment from Thomas DuBuisson:

You give QuickSpec a set of values (usually function values), and it [notionally] constructs all possible well-typed expressions using these values, and attempts to find equalities that always hold. (It is worth noting that this requires a correct implementation of Eq.)

For example, you might give it something like empty, insert and delete from Data.Map, and expect to get back rules like insert x (insert y empty) == insert y (insert x empty) and so forth. Again, this only works properly if Data.Map has an Eq instance and it actually works properly.

The idea appears to be that if you see rules pop up with are clearly bogus, or if you don't see rules that you were expecting to see, there's probably a bug in your code. (Then again, if you know what rules you're expecting, use QuickCheck in the first place!) Another possibility is to have QuickSpec generate a system of rules, and then turn those into QuickCheck properties. Now you can refactor the hell out of your code and check that its observable behaviour doesn't change.

It's an interesting way to approach the subject, if slightly odd at first. As an aside, I find it fascinating that QuickSpec can actually come up with a system of rules like this out of thin air. It seems almost magical. Very cool stuff...

Type-directed test generation

3 Answers3