Property-based testing with QCheck
Lecture 4 gave you OUnit2 and a habit
for it: pick an input, write down its expected output, compare,
report. The hand-written assertions in test_lifo exercise
a stack on three specific inputs out of the infinite number of
ways someone might use a stack. The five max3 assertions in
the opening lecture
exercise the function on five specific triples out of the trillions
of int * int * int values; the
test-design audit showed that those
five miss one of the function's four paths entirely.
That gap is the topic of this lecture. There is a fundamental limit to unit testing: you can only check the cases you thought of. Even the disciplined version of "thinking of cases", the boundary-and-partition design you practised two lectures ago, still hand-picks one representative per region. A bug that only fires on an input you did not consider sits quietly in the code, passing your test suite, until a user finds it.
Property-based testing closes that gap by inverting the relationship between the test author and the inputs. Instead of writing down a list of (input, expected output) pairs, you write down a property: a relationship between inputs and outputs that should hold for any input. A property-based testing library then generates inputs and checks the property on each one. A few hundred random inputs cost you almost nothing and exercise corners you would never have written by hand.
This lecture is the introduction to the OCaml flavour of that idea: the QCheck library.
The limit of hand-written tests
Let us sharpen the problem. We have a function you have written
at least twice in this course, recursively in the functions
module and via fold in the higher-order module; here it
returns as a test subject:
We write a unit-test suite for it:
Four assertions. The function passes all four. Are we done?
Maybe. The implementation is right. But we tested it on four inputs whose answer was obvious to write by hand. We did not test it on:
- a list of length 100,
- a list with duplicates,
- a list of strings (instead of
ints), - a list where every element is the same,
- a list whose elements are pairs.
In each of these, the function would have worked, but we did not check. And if the implementation had a subtle bug, that bug might only fire on, say, lists of length 100 or more (a recursion- depth issue, perhaps), or on lists with duplicates (a hash-set bug). Our four-case test suite would not catch it.
The right reaction is not "write fifty more cases." That is
exhausting and quickly hits diminishing returns: there are
infinitely many lists, and you do not know which finite subset is
representative. The right reaction is to step back and ask: what
should be true of rev, no matter what list we apply it to?
And then let a machine generate the lists.
Properties: what should be true of rev?
A property is a statement that should hold for every input of
some shape. For rev, three obvious properties:
- Involution:
rev (rev xs) = xsfor every listxs. - Length preservation:
List.length (rev xs) = List.length xs. - Permutation: the multiset of elements is unchanged.
The first is the most striking. It captures something that has to
be true by the meaning of rev: reversing twice gets you back.
Notice that we did not write down what rev produces; we wrote
down a relationship the output bears to the input. This is a
key feature of properties: they often abstract over the answer.
You do not say "rev of [1; 2; 3] is [3; 2; 1]"; you say "rev
is self-inverse." The first form needs you to pick an input and
work out the answer. The second form does not.
The second property, length preservation, is a consequence of
the contract of rev: it rearranges elements, it does not add or
drop any. If a buggy rev ever returned a list of a different
length, this property would catch it without us having to write
down the exact answer.
The third, permutation, is even more specific: it says the
elements are the same, just in a different order. Combined with
length preservation, it pins down a much smaller space of
possible behaviours. Smaller, but not a single point: the
identity function preserves the multiset and is its own inverse,
so it passes all three properties and is clearly not rev. What
none of them mention is order; a property that does, say
relating rev (x :: xs) to rev xs @ [x], is what separates
rev from the impostors. Properties constrain the behaviour;
they rarely characterise it, and noticing which behaviours
slip through is part of the craft.
QCheck: the API in one example
Now the OCaml part. The QCheck library lets us write the involution property like this:
Five pieces:
QCheck.Test.makeis the constructor for a property-based test. It returns aQCheck.Test.tyou can hand to the runner.~nameis the display string in the output. Same role as the name on an OUnit2 case.~countis how many random inputs to try. Default is 100; we bumped it to 1000 because cheap.QCheck.(list int)is a generator: a recipe for producing randomint lists.QCheck.intis a generator for random ints;QCheck.list glifts a generatorgover elements into a generator for lists of elements.(fun xs -> ...)is the property: a function that receives a generated input and returns abool.truemeans the property held;falsemeans it failed.
To run the test:
That form works right here in the page's cells: the runner's
report lands in the cell's output pane, and the call returns 0
when everything passed. (~colors:false because the output pane
is not a terminal: without it the report arrives wrapped in the
ANSI colour codes meant for one, and they render as garbage.) In
a dune project you would instead end the test file with the
variant below, which additionally parses command-line flags and
exits with the right status code:
let () =
QCheck_runner.run_tests_main [test_rev_involutive]
QCheck_runner.run_tests_main is the entry point; it parses CLI
flags (so you get -v, -s for seed, etc. for free) and prints
results. A passing run reports something like:
random seed: 42
Law rev is involutive: OK (passed 1000 tests).
A thousand random lists, all satisfying rev (rev xs) = xs. We
have not exhausted the input space (no machine can; there are
infinitely many lists), but the law has held against 1000
adversaries we did not pick, which is a stronger signal than four
hand-picked cases.
When a property fails
A passing report is only half the story; the reason PBT earns its keep is what happens when the property is false. Let us assert a false belief on purpose: that every list is already sorted.
Run it. QCheck needs only a handful of draws to find a list that is not sorted, and the report names the offender (the exact values vary with the random seed):
--- Failure ------------------------------------------------------
Test every list is already sorted failed (66 shrink steps):
[0; -1]
failure (1 tests failed, 0 tests errored, ran 1 tests)
Two things deserve attention. First, the failing input is
printed, which is why a generator carries a printer alongside
the random producer. Second, look how small it is: the first
unsorted list QCheck stumbled on was almost certainly dozens of
elements long, but after "66 shrink steps" it reported a
two-element list, the smallest list that is not sorted. That
minimisation step is called shrinking, and it has its own
section later in this lecture.
The return value 1 is the number of failed tests, which is what
makes the dune exit-code wiring work.
Generators
The library's interesting work is in its generators. A
QCheck.arbitrary 'a is a
value bundling three things: a random producer of 'a values, a
printer (for failure messages), and a shrinker (more on
shrinking in a moment).
The basic generators are exactly what you expect:
QCheck.int: a uniformly randomint.QCheck.small_int: small positiveints (handy for sizes).QCheck.bool:trueorfalse.QCheck.string: a random string.QCheck.float: a randomfloat, includinginfinityandnan(but only rarely; do not count on a run hitting them).
And combinators that build bigger generators out of smaller ones:
QCheck.list g: a list of values fromg.QCheck.array g: an array of values fromg.QCheck.pair g1 g2: a pair.QCheck.option g:Nonehalf the time,Some (g ())the rest.QCheck.oneof [g1; g2; ...]: pick one of the given generators.QCheck.map f g: producef (g ()).
These compose: QCheck.(list (pair int string)) is a generator for
random lists of (int, string) pairs.
Each test you write picks a generator that matches the type of input the function under test expects, and that is most of the configuration work.
Properties are pure functions
Notice that the property fun xs -> rev (rev xs) = xs is a
pure OCaml function. No state, no IO, just an input and a
boolean output. QCheck does not need any special syntax; it just
calls the function 1000 times on 1000 different inputs.
This is exactly the reason that property-based testing fits so neatly into functional programming. In an imperative language, properties have to navigate state: you set up the world, you run the function, you compare the world before and after. In OCaml, the function takes its arguments and returns its result; the property is the relationship between the two, expressible as a single expression. There is no setup, no teardown, no shared state.
Better still: in a pure language, the property literally is the
spec. The OCaml expression rev (rev xs) = xs is the same
mathematical statement a textbook would write. You can read it
aloud, you can manipulate it equationally, you can prove things
about it. PBT inherits this clarity from the language. In
Python, where everything could mutate, the same property reads
"call rev on xs, save the result; call rev on that, save that
result; compare to the original xs; oh also check that nothing
secretly mutated xs along the way."
Negative properties: preconditions
Sometimes a property only holds for some inputs, not all. For
example, "the head of a list equals its first element" is true
only for non-empty lists. QCheck supports this with
preconditions via QCheck.assume:
When the generated input fails the precondition, QCheck skips that input and generates another. The case still counts as checked but does not exercise the property; the run above passes with every empty list silently discarded. If too many inputs fail the precondition (because the generator produces them rarely), QCheck gives up and warns you. That is your signal to write a custom generator that produces only inputs in the precondition's range, rather than rejecting most of what the default generator makes.
A second example: List.sort
The sort property is richer because "sorted" by itself is too weak
to pin down behaviour. A function that returns the empty list on
every input is "sorted" trivially. To capture List.sort we need
two properties: the output is sorted, AND the output is a
permutation of the input.
same_elements checks multiset equality by counting: the two
lists have the same length, and every element of the first occurs
the same number of times in both. Counting keeps the checker
independent of the function under test. (The tempting shortcut,
sort both lists and compare, uses a sort to specify sort; fine
when the yardstick sort is one you already trust, but the
count-based checker dodges the question entirely.) The checker is
quadratic, so the permutation test draws its inputs from
QCheck.(list_small int), a variant of list that keeps the
generated lists modest; QCheck.(list int) happily produces
lists thousands of elements long.
Three properties. Each one would be satisfied by some wrong function, but a function that satisfies all three is genuinely hard to write incorrectly.
- "produces a sorted list" by itself is satisfied by
fun _ -> []. - "preserves the multiset" by itself is satisfied by
fun xs -> xs(the identity). - "preserves length" by itself is satisfied by many functions
(the identity,
rev, ...).
Together, only sort-like functions survive. All three pass
against List.sort:
This is the art of property-based testing: choosing a small set of properties whose conjunction approximates the specification. For most data-structure operations, three or four properties suffice. For more complex code (a parser, a query planner), you might write a dozen. Each property is small; each is independent; each contributes one constraint.
Shrinking: finding the smallest counterexample
The most important feature of QCheck (and of QuickCheck more generally) is shrinking. When the library finds an input that fails the property, it does not just report "this 17-element list broke things." It actively tries to minimise the failing input: "can I delete one element and still see the failure? Can I delete another? Can I replace this element with a smaller one?" The result is the smallest failing input the library can find, typically a one or two element list, which is far easier to debug.
Let us see this in action with a deliberately buggy sort. A
common rookie mistake: forgetting the singleton case in a merge
sort. (merge, combining two sorted lists into one sorted list,
is the standard helper.)
If xs has length 1, then n = 0, left = [], right = xs. So
bad_sort [x] calls bad_sort [] (returns []) and bad_sort [x] (recurses), forever. Stack overflow on any non-empty input.
We point the "sorted" property from before at bad_sort:
Run it. QCheck explores random lists. Some are empty (returns empty, trivially sorted). Most are non-empty. The first non-empty list it generates causes a stack overflow. QCheck catches the exception, marks the test errored, and starts shrinking.
Shrinking proceeds roughly as follows:
- The original failing input was, say,
[3; -5; 0; 17; 42](5 elements). - Can we drop one? Try
[-5; 0; 17; 42](drop first). Fails. Smaller input. Continue. - Drop another. Try
[0; 17; 42]. Fails. Keep going. - Down to
[17; 42]. Fails. Smaller. - Down to
[42]. Fails. Smaller still. - Down to
[]. Passes (bad_sorthandles the empty case). So[]is not a counterexample: some one-element list is the minimum. The shrinker also shrinks the surviving element towards zero, and reports[0].
The output looks something like (the seed and the step count vary from run to run):
random seed: 449586813
=== Error ======================================================
Test bad_sort produces a sorted list errored on (68 shrink steps):
[0]
exception Stack overflow
================================================================
failure (0 tests failed, 1 tests errored, ran 1 tests)
Three characters of input. The bug is obvious now: "a single- element list overflows the stack." Without shrinking, the original failing input would have been dozens of elements wide, and the bug would have been buried in irrelevant noise. The shrinker is what makes PBT debuggable.
The shrinker is built into the library's generators. When you use
QCheck.(list int), the resulting arbitrary value carries a
shrinking strategy: drop elements, replace elements with smaller
ones. (A custom generator built with QCheck.make is the
exception: it carries no shrinker unless you pass one. A note
on that later in the lecture.)
For the level of this lecture, you almost never need to write a shrinker by hand. The built-in generators come with reasonable ones, and your time is better spent on properties than on shrinkers. We mention shrinking because it is what makes PBT output actionable, but the day-to-day reality is "use the default shrinker, point at your property, watch QCheck do the work."
A question to keep in the back of your mind: how do you know QCheck is exploring inputs usefully? If 90 percent of the lists it generated were length 0, a passing run would mean very little. We return to this later in the lecture, under "Beyond the built-in generators".
Equational reasoning gives properties for free
Coming back to why this works so naturally in OCaml: in a pure functional language, every law you can state in a textbook is a property you can hand to QCheck.
For lists:
List.length (xs @ ys) = List.length xs + List.length ysList.rev (xs @ ys) = List.rev ys @ List.rev xsList.map f (List.map g xs) = List.map (fun x -> f (g x)) xsList.fold_left f a (xs @ ys) = List.fold_left f (List.fold_left f a xs) ys
For functions:
(fun x -> f (g x)) = compose f gid (f x) = f xf (id x) = f x
For numeric operators:
x + y = y + x(x + y) + z = x + (y + z)
Each one is a single line of OCaml turned into a QCheck property. The equational nature of functional code means the laws ARE the properties. In an imperative language you would write "compute lhs, store it; compute rhs, store it; compare." Here the law is itself executable.
This is the deepest reason testing-by-property is more popular in the FP world than elsewhere. The language gives you a vocabulary of pure functions, and the testing library hands that vocabulary straight to a fuzzer.
Beyond the built-in generators
What we have so far is enough to write interesting properties for any pure function whose input fits one of QCheck's built-in generators (lists of ints, strings, pairs, options). Two pieces lie beyond this lecture.
First, steering the distribution. "1000 random lists" hides
an important question: which 1000? For a sorter, the telling
inputs are the already-sorted list, the reverse-sorted list,
lists full of ties, the empty and one-element lists; a uniformly
random (list int) visits all of these rarely, so a boundary
bug can survive 1000 trials. When the bugs you care about live
in a corner of the input space, you build a generator that
visits that corner on purpose, out of the QCheck.Gen
combinators, and you check your work with QCheck.collect,
which reports the distribution a generator actually produces.
The tutorial builds one such generator
for a recursive expression type; the QCheck documentation
covers the rest of that toolkit.
Second, state. Everything we tested today is a pure function. Much of real software is a stateful API: a queue you enqueue to and dequeue from, where each operation's behaviour depends on everything that came before. Writing properties for that needs one more idea, and it is the topic of the next lecture.
Shrinking, in depth
We introduced shrinking informally with the bad_sort example: a
failing multi-element list got shrunk to a 1-element list, and the
bug was obvious in the small witness. Now we go under the hood.
Why shrinking matters
When a generator finds a counterexample, it almost never finds
the minimal one. The first failing list might have 7 elements,
6 of which are irrelevant noise. The reported failure should be
"[3]", not "[42; -17; 0; 3; 99; -2; 8]". The difference is
the difference between a tractable bug report and an
intractable one.
Shrinking is the process of minimising the failing input while preserving the failure. It is a search procedure:
- Start with the witness the generator found.
- Propose a smaller candidate (drop an element, halve a number, ...).
- Re-run the property on the candidate.
- If the candidate fails, recurse with it as the new witness.
- If the candidate passes, try a different reduction.
- Stop when no further reduction fails.
The result is a local minimum of "things that still fail." Not necessarily the globally smallest, but small enough.
How QCheck shrinks each type
QCheck's built-in arbitrarys come with shrinking baked in. The
default behaviour for the standard types:
Integers. Shrink toward zero. Given n = 42, try 0, then
21, then 32 (half-way), and so on by bisection. If the bug
fires only on negative numbers, the shrinker walks -42 -> 0
-> -21 -> ..., converging on the smallest negative that still
fails (often -1).
Booleans. Shrink true to false. (false is the "smaller"
value in QCheck's convention, by alphabetical / numerical order
of constructors.)
Strings. Shrink by deleting characters; once minimum-length
is reached, shrink each remaining character (toward 'a' or
similar). A failing 20-character string converges to a 0 or 1
character string in a few steps.
Lists. Shrink in two phases:
- Structural. Try dropping single elements, then pairs, then halves. A failing list of length 7 might shrink to length 6, then 4, then 2, then 1.
- Element-wise. Once the list is at its minimum length, shrink each element individually (using the element's shrinker).
Pairs, options. Shrink each component independently, then the whole.
Sum types. If a value is Some x, try None. If it is
Inr y, try Inl .... Then recurse into the component.
The shrinkers compose: QCheck.(list (pair int (option string)))
inherits a shrinker that descends into the list, then into each
pair, then into each option, then into the integer or string,
all by minimising at each level.
The integer-shrink-toward-zero heuristic
A specific case worth pinning down. When QCheck shrinks an
integer, it does not try every smaller value (that would be
useless for min_int); it bisects toward zero. The sequence for
n = 100:
100 -> 0 (try the limit)
100 -> 50 (try the halfway point)
100 -> 75
...
If 0 already fails the property, the shrinker reports 0 and
stops. If 0 passes but 50 fails, the search converges
between 0 and 50. The result is logarithmic in the magnitude
of the original number, which means even very large
counterexamples shrink to small ones in a handful of steps.
The heuristic is a heuristic: it can miss the true minimum. If the bug fires only on prime numbers and the bisection avoids primes, the shrunk witness might be larger than the smallest failing prime. In practice this is rare, and the shrinker's local minimum is usually small enough to debug.
When you need a custom shrinker
One caveat to carry forward. If you build a custom generator
with QCheck.make and do not pass a shrinker, the resulting
arbitrary has no shrinking: QCheck reports the original
failing input verbatim, however large. The optional ~shrink
argument fixes that; it takes a function from a failing value to
the candidates to try in its place, usually a few lines built
from the QCheck.Shrink helpers. You will write exactly one in
this module, in the
model-based testing lecture,
where the failing input is a list of commands of a custom
variant type. Beyond that, the built-in shrinkers for int,
list, string, and friends handle nearly every case.
Why "smallest failure" is more useful than "first failure"
A final note on shrinking's value. The original QuickCheck paper made the point that random testing without shrinking is much less useful than random testing with shrinking, even though the random-generation work is the same. The reason: programmers don't care about a 200-character string that crashes the parser; they care about which character in that string caused the crash. Shrinking turns "huge counterexample" into "minimal counterexample," which is what a debugger needs.
If you take one thing from QCheck's API, take the shrinker. The generator does the search; the shrinker does the diagnosis.
When PBT does not help
PBT is not a universal solvent. There are cases where unit tests still win.
Specific known cases. "What does sort [3; 1; 2] return?" is
a unit test, not a property. PBT cannot tell you the answer to
that specific question.
Where the spec IS a list of cases. Some functions are defined by a finite table (the days of the week, the bytecode opcodes). There is no "for all" structure to abstract over; a hand-written case per row is more honest.
Cases where generators are expensive. If generating a valid input is itself a multi-step process (parse, normalise, validate), a hand-written corpus of representative inputs may be cheaper than a custom generator.
Cases where the property is just "the output equals the expected output." That is unit testing dressed in PBT clothes, without any of the benefit.
The right reflex is: unit tests for known specific cases; PBT for invariants. They are complementary, just like types and tests.
A short history
PBT was introduced by Koen Claessen and John Hughes in the QuickCheck library for Haskell, published in ICFP 2000. The idea has since been ported to dozens of languages: Hypothesis in Python, fast-check in JavaScript, proptest in Rust, and many others. OCaml's QCheck (by Simon Cruanes and contributors) is the direct descendant of QuickCheck, with OCaml- idiomatic conventions.
The intellectual contribution of QuickCheck was not the random testing itself: random testing predates QuickCheck by decades. It was the combination of random generation with automatic shrinking inside a strong type system. The type system makes it possible to auto-generate inputs for any type, and shrinking makes the counterexamples small enough to debug. Together they make PBT genuinely productive for working programmers, not just a research curiosity.
A recurring surprise in Hughes's later experience reports is that property-based testing often finds bugs not in the implementation but in the specification itself: the hand-written reference model that was supposed to be the oracle. Testing a real telecoms system against a formal model, his team repeatedly found that counterexamples pointed at mistakes in the model, not the code. This is the strongest argument for properties over hand-written cases: a property is forced to confront inputs you would never have thought to write down, including the ones that expose your own misunderstanding of what "correct" means.
Activity
A colleague writes a dedup : int list -> int list and a single
QCheck property:
let test = QCheck.Test.make QCheck.(list int)
(fun xs -> List.length (dedup xs) <= List.length xs)
The property passes on 1000 random inputs. What is the strongest
conclusion you can draw about dedup?
dedupis correct.dedupis incorrect; one property is never enough.dedupreturns lists of length at most the input length; many incorrect implementations also satisfy this property.dedupreturns the correct multiset of elements.
Why: the property only states that the output is no longer
than the input. fun _ -> [] satisfies it (length 0 is at most
any length). So does the identity. So does "return the first
element only." A single weak property cannot replace a
specification; you need several properties whose conjunction
constrains behaviour. Length-no-longer-than is one constraint;
"every element of the output appears in the input", "no
duplicates in the output", and "every input element appears in
the output" would together pin dedup down.
QCheck runs a property on a buggy function and reports:
Test failed on input: [3].
The function was tested with QCheck.(list int), which generates
random lists of random length. The reported input is a one-element
list, but the random input that triggered the bug was, the
seed log suggests, a 12-element list. What happened?
- QCheck regenerated the input from a smaller seed.
- QCheck shrunk the failing 12-element input down to the smallest input that still fails, namely
[3]. - The 12-element input did not actually fail; QCheck mis- reported.
- The bug fires randomly and the seed determines which inputs trigger it.
Why: shrinking is QCheck's process of repeatedly minimising
a failing input. After finding a 12-element counterexample, it
tries dropping elements, replacing elements with smaller ones,
and so on, until further reduction would make the property pass.
The reported [3] is the minimal witnessed failure. This is
the central feature that makes PBT actionable: the original
random failure is usually too noisy to debug; the shrunk
witness is usually the bug in its purest form.
Write a QCheck property that captures: concatenating the empty
list to any list yields the original list. Use the generator
QCheck.(list int). Name the property "empty is right identity for @".
The body of the test should be a function xs -> bool returning
true when the law holds.
Show reference solution
Reference solution:
Four lines. The property xs @ [] = xs is a one-line statement
of a list-monoid law; it is exactly the kind of equational
property functional code makes easy to state.
Common pitfalls
Pitfall 1: too few properties. A function with one property that "passes" is barely more tested than a function without properties at all. State several. The conjunction is what constrains behaviour.
Pitfall 2: tautological properties. Watch for properties that
hold of any function with the right type. fun xs -> List.length (rev xs) >= 0 is true of every list, including the wrong ones.
The property has to exclude implementations.
Pitfall 3: properties that depend on the implementation. "rev allocates a new list" depends on the implementation, not the spec. Phrase properties in terms of observable behaviour.
Pitfall 4: too-narrow generators. If QCheck.small_int only
produces 0-9 but the bug fires on min_int, the bug will not be
caught. Default generators are reasonable; custom ones need
thought.
Pitfall 5: confusing "passed 1000 cases" with "proved". PBT gives you evidence, not proof. A bug that fires only on inputs with a specific shape (e.g. a deeply nested record) may evade 1000 random inputs. PBT is a sieve, not a verifier.
What's next
So far the things we have tested are pure functions: an input
goes in, an output comes out, no state. PBT shines there. But
much of real software is stateful: a hash table you add to and
remove from, a queue you enqueue and dequeue, a file you
read and write. How do you write a property for a stateful
data structure, where each operation depends on every operation
that came before?
Lecture 6 answers with model-based testing: test a stateful implementation against a simple reference implementation, by generating random sequences of operations and asserting observable equivalence at each step. It is the canonical PBT pattern for stateful code, and it cleanly extends what we have built in this lecture.
Lecture 7 puts unit testing and property-based testing together on a single, larger example: a small arithmetic evaluator (a float cousin of the pattern-matching tutorial's interpreter), with a deliberately broken implementation that QCheck finds in seconds.
Reading
- QCheck, the OCaml property-based testing library used in this lecture. README, tutorial, and API docs are all in the upstream repository: https://github.com/c-cube/qcheck
- QCheck API documentation, generated from the source: https://c-cube.github.io/qcheck/
- Claessen and Hughes, QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs, ICFP 2000. The paper that introduced PBT and the combination of random generation with automatic shrinking: https://www.cs.tufts.edu/~nr/cs257/archive/john-hughes/quick.pdf
- Cornell CS3110, Randomized testing with QCheck. Source for the abstraction layering (generator, arbitrary, property) used in this lecture: https://cs3110.github.io/textbook/chapters/correctness/randomized.html
- Cornell CS3110, Black-box and glass-box testing. The framework of "paths through the specification" we used in the input-space section: https://cs3110.github.io/textbook/chapters/correctness/black_glass_box.html
- Real World OCaml, Testing. The Quickcheck section
discusses distribution choice and how
ppx_quickcheckderives generators automatically: https://dev.realworldocaml.org/testing.html
Sources
This lecture's prose, worked examples, and quizzes are original
to this course. The QCheck library by Simon Cruanes and
contributors is BSD-2-Clause licensed; we link to its
repository and use its public API surface, with no derivative
reuse of its prose. The QuickCheck origin paper (Claessen and
Hughes) is the canonical reference for the technique itself;
we cite it for context. Cornell CS3110's randomized-testing
and black-box-testing chapters are CC BY-NC-ND licensed and
have not been derivatively reused; the paths through the
specification framing we use in the input-space section
follows their pedagogical sequence with our own examples and
prose, and we link to both for further reading. Real World
OCaml's Testing chapter has a parallel discussion of Quickcheck
distribution choice that we link to but have not reused. The
bad_sort example (missing singleton case) and the sorted-list,
operation-based-BST, and DAG-by-vertex-order generator recipes
are folklore in the PBT community, presented here in our own
words and OCaml.
See LICENSES.md
at the repository root for the full source posture.