Code your own QuickCheck

In the next posts, we will go through a small implementation challenge. The goal will be to implement our own limited version of QuickCheck, the famous generative testing framework of Haskell.

Whether or not you do know about the concept of generative testing, I advice you to have a look at the original publication of QuickCheck. This is indeed a very good and quite enlightening read.

Our goal will be to reproduce something close to the features described in this publication, all on our own. Through this exercise, we will gain some better understanding of the magic that happens inside QuickCheck.

  • The goal of today is to implement the basic features of QuickCheck
  • The next post will be focused on testing higher order functions
  • Our last post will be focused on shrinking counter examples

We will name our own generative testing framework RapidCheck to avoid confusion when referring to the real QuickCheck.

 

Introduction


As a quick refresher, QuickCheck is a framework that allows to test properties that we think should hold on our functions. It is pretty versatile and can be used in many different contexts:

  • To test that the code your wrote does behave correctly
  • To investigate on the behaviour of some code you are reading
  • To test specifications against a working code

QuickCheck is not a proof system and is not a replacement for example based testing either. QuickCheck is more like a nasty user which will try everything he can to break your software.

Romeu Moura did a great talk explaining what property-based testing is and is not about. I encourage you to have a look at it.

 

Our goal for today


Our goal for today will be to implement a subset of our RapidCheck that supports checking some basic properties. We will keep the topic of testing higher order function for our next post.

More practically, we give ourselves the goal to test the two following properties on the Greatest Common Divisor with RapidCheck, all by the end of this post.

 
The prop_gcd is a valid property. Multiplying the Greatest Common Divisor of a and b with the Lowest Common Multiple of a and b should be equal to the produce of a and b. We expect this test to pass.

prop_gcd :: Integer -> Integer -> Bool
prop_gcd a b = a * b == gcd a b * lcm a b
rapidCheck prop_gcd
> Success

 
The prop_gcd_bad is an invalid property. The Greatest Common Divisor of two numbers might be equal to 1 if these numbers are coprime. We expect this test to fail (with high probability) and we expect RapidCheck to provide us a counterexample.

prop_gcd_bad :: Integer -> Integer -> Bool
prop_gcd_bad a b = gcd a b > 1
rapidCheck prop_gcd_bad
> Failure {seed = -1437169021,
counterExample = ["1076253199","40866101"]}

 
How does the rapidCheck function knows how to test these properties? How does it recognize which kind of input to generate? These are the questions that will be answered throughout this post.

We will use the same vocabulary used inside the original publication of QuickCheck. Our implementation will differ on several aspects from the one of the publication, in the spirit of simplifying things a bit.

 

Selecting our Outputs


Let us start by addressing the simplest of our needs: returning useful results as output to the tests.

Needless to say, because Property Based Testing is based on generating random inputs, returning “Test Failed” is not really helpful. The output should at the very least contain enough information to replay the failing scenario.

We will select two useful pieces of information in case of failure:

  1. The counter example itself (the failure inputs) as strings
  2. The random seed used to generate the counter example

This translates into the following Result type:

data Result
= Success -- In case of success, no additional information
| Failure {
seed :: Int, -- The seed used to generate the counter example
counterExample :: [String] -- The counter example (failing inputs to string)
} deriving (Show, Eq, Ord) -- Useful instances to print and compare Results

 
To help dealing with the Failure case more easily, we create the following helper function overFailure. It takes a function, applies it on the Result if it is a Failure, and returns the updated value.

overFailure :: Result -> (Result -> Result) -> Result
overFailure Success _ = Success
overFailure failure f = f failure

 
We also provide a Monoid instance for our result type. A Monoid instance is a type offering a default value, mempty in Haskell, and a binary associative operation, mappend in Haskell.

For Result, we will select a meaning that will make it easier for us to collapse several test results into one single result:

  • mempty: running no test case always results in Success
  • mappend: if at least one test case fails, the whole test fails
instance Monoid Result where
mempty = Success
mappend lhs@Failure{} _ = lhs
mappend _ rhs = rhs

Note that our implementation is left biased: it will report the leftmost failure if both test cases fail. Applied to a list of several test cases, it will therefore report the first error found in the list.

 

Generating random Inputs


In order to be able to test properties over our functions, RapidCheck needs to be able to generate some random values. We model a generator of random input of type a by the type Gen parameterized on the type it generates.

This Gen a is a wrapper around a function that takes a pseudo-random generator (an instance of StdGen from the System.Random module) and returns an element a.

We also provide an typeclass (a kind of interface) named Arbitrary for types for which a standard generator is defined.

newtype Gen a = Gen {
runGen :: StdGen -> a
}
class Arbitrary a where
arbitrary :: Gen a

 
Note that the function wrapped in Gen is pure. Calling it twice with the same random generator will return the same result twice. It means that the caller of the function is responsible for providing an updated pseudo random generator to the function to get different results.

This might seem awkward, but we have to remember that our goal is to output reproducible test cases for the API user. Being able to replay a failing scenario deterministically is an important feature. A pure function is the safest way for us to guaranty this property.

 

Property


The goal of our RapidCheck library is to check to properties on our code. Properties are predicates we can run generative test against, to verify if they hold. We identify several characteristics of a Property:

  • We need to test it, and generate a Result from it
  • This result is based on randomly generated inputs

Presented differently, a Property is a computation that return a Result through a pseudo-random generation process. It means we can model a property as a wrapper around Gen Result:

newtype Property = Property {
getGen :: Gen Result
}

 
To avoid systematic unwrapping of the Property, we provide the following helper function runProp. It just calls the generator function with the provided pseudo random generator StdGen.

runProp :: Property -> StdGen -> Result
runProp prop rand = runGen (getGen prop) rand

 
This Property type might seem incomplete. In particular, there is no mention of the random inputs that drives the generation of the Result. Surely, this is a problem!

Not if we think of a Property as the result of a construction process that integrates the generation of the random inputs together with the predicate we want to verify. The actual “construction process” of a Property is the subject of the next section.

 

Testable inputs


The purpose of this section is to define the inputs of our rapidCheck function. Before we go further, let us go through an inventory phase, in which we will review what we already know and express on paper what we really want.

 

Inventory

Our ultimate goal is to test any property expressed as a function that returns something we could interpret as a Result. From now on, we will refer to such function as testable function. In particular, any function returning a Bool or a Result (like our prop_gcd function) should qualify.

prop_gcd :: Integer -> Integer -> Bool
prop_gcd a b = a * b == gcd a b * lcm a b

Each of the argument of a testable function should have a random generator associated with it. We already know how to express this constraint: all arguments should implement the Arbitrary type.

We also have a nice target abstraction for our testable functions, the type Property we introduced earlier. We want a way to transform such testable functions into a Property.

For this transformation to make sense, the generator wrapped by the Property should represent the combined effect of the generators of each of the arguments of the original function.

 

Our problem

Our problem is that there are theoretically no limits to the number of arguments this function could take. We could try to take a shortcut and provide overloads until we reached a maximum number of arguments, but we can do better.

The solution is to break down our problem and think in terms of mathematical induction. This means finding a base case, a set of function with no arguments which we can convert to a Property, and identifying a induction step to grow the number of arguments supported one at a time.

We define Testable as the typeclass that represents the functions that can be transformed into a Property (our testable function, or alternatively, something we can generate a Result from):

class Testable a where
property :: a -> Property

Our goal is identify a base case of known Testable instances, and implement an induction step to grow this set of instances.

 

Base case

Based on our inventory, we know that functions with no arguments (constants) that return a result or a boolean are convertible to a property.

These constants do not have any random inputs influencing them. As a consequence, a Property obtained from one such function should always produce the same consistent result. This is expressed in the code below:

-- Property is trivially convertible to Property
instance Testable Property where
property = id
-- Result is convertible to Property by creating
-- a generator that always return this same result
instance Testable Result where
property r = Property (Gen (const r))
-- Bool can be converted to Result
-- Result can be converted to a Property
-- By composition we get that Bool is convertible to Property
instance Testable Bool where
property = property . toResult where
toResult b = if b then Success
else Failure { seed = 0, counterExample = []}

 

Induction step

The inputs of our induction step are an instance of a Testable function, named testable and a printable argument implementing Arbitrary. Our desired output is a testable instance for the function taking the additional argument as parameter.

Let us write this desire, with the implementation left undefined:

instance
(Show a, Arbitrary a, -- Given a type `a` supporting random generation
Testable testable) -- And an already existing testable function
=> Testable (a -> testable) -- The function (a -> testable) is also testable
where
property = undefined -- With the provided implementation

Now, let us assume the existence of a function named forAll that takes a generator and our function a -> testable and return a new testable instance.

Then we could define our instance as follows, by providing the generator of our argument a to the forAll function:

instance
(Show a, Arbitrary a,
Testable testable)
=> Testable (a -> testable)
where
property f = forAll arbitrary f
forAll :: (Show a, Testable testable) => Gen a -> (a -> testable) -> Property
forAll = undefined

We just pushed the problem a bit further down the road. But the forAll really needs its next section for it alone.

 

Implementing forAll


Before diving into the implementation of the forAll function, let us note that it is clearly more general than our Testable instance.

It does not require a type to implement Arbitrary and instead asks for a generator of argument. This function is both at the heart of QuickCheck and a very important tool when dealing with random generation of inputs.

This allows to define specialized random generator for the same argument. This in turns allows to test some properties that hold in specific equivalence classes only.

 
Now, to the implementation.

The code will rely on the use of the split function of the module System.Random module. This function allows to create two StdGen pseudo random generator from a single one. It is important to note it is a pure function, meaning its result is deterministic.

The Property created by forAll is a wrapper around a generator: it takes as input a StdGen, named rand, to output a Result. Inside the returned Property, we will split this pseudo random generator in two, named rand1 and rand2.

  • We use rand1 to generate the new argument arg to the testable function
  • We use arg to get subProp, which is already an instance of testable (by our induction hypothesis and the type of forAll)
  • We use rand2 to run subProp and generate our result
  • In case of failure, we enrich the counter-example with the value of arg

You can find the full implementation below:

forAll :: (Show a, Testable testable) => Gen a -> (a -> testable) -> Property
forAll argGen prop =
Property $ Gen $ \rand -> -- Create a new property that will
let (rand1, rand2) = split rand -- Split the generator in two
arg = runGen argGen rand1 -- Use the first generator to produce an arg
subProp = property (prop arg) -- Use the `a` to access the sub-property
result = runProp subProp rand2 -- Use the second generator to run it
in overFailure result $ \failure -> -- Enrich the counter-example with `arg`
failure { counterExample = show arg : counterExample failure }

 

Implementing rapidCheck


The goal of rapidCheck is to take a Testable instance as parameter, and try its associated Property with different values of seeds, in an attempt to find a counter-example.

We will first define a function rapidCheckImpl that takes as parameter a starting seed for our pseudo random generator, and a maximum number of attempts to try to find a counter-example:

  • runOne will trigger one such attempt, with a seed provided as parameter. Upon failure, it will complete the error report with the seed.
  • runAll will run all attempts and exploit the Monoid instance for Result to collapse these results together
rapidCheckImpl :: Testable prop => Int -> Int -> prop -> Result
rapidCheckImpl attemptNb startSeed prop = runAll (property prop)
where
runAll prop = foldMap (runOne prop) [startSeed .. startSeed + attemptNb - 1]
runOne prop seed =
let result = runProp prop (mkStdGen seed)
in overFailure result $ \failure -> failure { seed = seed }

We can note that, once again, this function is side effect free. It will return a deterministic result.

 
We finally introduce the functions rapidCheck to get the non-determinism needed to solve our problem. This function produces a side-effect to get the initial seed to feed rapidCheckImpl. We also add rapidCheckWith, as a variation of rapidCheck that accept the maximum number of attempts as parameter.

rapidCheck :: Testable prop => prop -> IO Result
rapidCheck = rapidCheckWith 100
rapidCheckWith :: Testable prop => Int -> prop -> IO Result
rapidCheckWith attemptNb prop = do
seed <- randomIO
return $ rapidCheckImpl attemptNb seed prop

 
There they are, the only two functions with side effects of our RapidCheck module. It is quite remarkable how far we can push side effects away from the core implementation of a module whose main purpose is to generate random inputs to find counter-examples.

 

Enjoying rapidCheck


Now that we did all this work, it is time to enjoy. We will run our initial test cases and have a look at how it behaves.

Both of our test cases takes two integers and returns a boolean. We know from the previous sections that a function with no parameters (a constant) returning a boolean implements Testable. What is missing for the whole function to be Testable is an Arbitrary instance for our Integer type:

  • We use the pure function next of System.Random to generate a Int
  • We convert this Int to an Integer (an arbitrary large number)

Because we generated a Int, the random generator will not cover all the spectrum of values an Integer could have. We will ignore this here (*), as the only reason why we use an Integer was to avoid overflows:

instance Arbitrary Integer where
arbitrary = Gen $ \rand ->
fromIntegral (fst (next rand))

 
We expect our first test case to pass as it verifies a valid property of our GCD:

prop_gcd :: Integer -> Integer -> Bool
prop_gcd a b = a * b == gcd a b * lcm a b
rapidCheck prop_gcd
> Success

 
Our second test case should fail with high probability. RapidCheck should be able to exhibit a counter example based on two coprime numbers. And it does:

prop_gcd_bad :: Integer -> Integer -> Bool
prop_gcd_bad a b = gcd a b > 1
rapidCheck prop_gcd_bad
> Failure {seed = -1437169021,
counterExample = ["1076253199","40866101"]}

 
Note that if we had used Int instead of Integer, RapidCheck should be able to find a counter example to our valid prop_gcd properties. Because an Int has not an infinite precision, we expect RapidCheck to find integer overflow issues. And it does:

instance Arbitrary Int where
arbitrary = Gen $ \rand -> fst (next rand)
prop_gcd_overflow :: Int -> Int -> Bool
prop_gcd_overflow a b = a * b == gcd a b * lcm a b
rapidCheck prop_gcd_overflow
> Failure {seed = -881134321,
counterExample = ["171542757","1235104953"]}

 
(*) In real code, we should avoid defining Arbitrary instances with bad statistical properties (what we did for the Integer). We always have the option to define custom generators and use forAll explicitly.

We can also note that writing a random generator for arbitrary long integers is not that easy. Thankfully, QuickCheck integrates a great deal of already defined Arbitrary instances. Practically, we do not need to worry about writing a generator of Integer.

 

Conclusion and what’s next


In this post, we went over the basic building block of the QuickCheck library, by implementing our own Property Based Testing framework, RapidCheck.

We used the same vocabulary as QuickCheck to better understand how the real library works. The details of implementations are not the exact reproduction of the original library though:

  • The Gen type is not a Monad in RapidCheck, to avoid the topic entirely. It also support less feature than the QuickCheck version.
  • The Result type is much simpler in RapidCheck to concentrate on the principles rather than trying to be exhaustive.
  • As a consequence, both forAll and rapidCheck functions, although quite similar in purpose, are quite different in terms of implementation.

 
But we are not done yet. The next post will go into another magic property of QuickCheck. We will add to our RapidCheck the ability to do something quite remarkable: generating function to test higher order functions.

One thought on “Code your own QuickCheck

Add yours

Leave a comment

Create a website or blog at WordPress.com

Up ↑