Specifications and invariants
The previous lecture ended on a claim that sounds obvious until
you try to act on it: a test checks code against what the code is
supposed to do. The sort that returned its input unchanged was
wrong, not because of anything in the code itself, but because of
a sentence that exists outside the code: "sort xs is xs
arranged in ascending order." The compiler never saw that
sentence. The test assert (sort [3; 1; 2] = [1; 2; 3]) is that
sentence, translated into something executable.
So before this module reaches for any testing tool, we should ask: where is "supposed to" written down? What does it look like when it is written down well? That written-down intent is called a specification, and this lecture is about writing them: first for functions, where a specification is a contract about inputs and outputs, and then for data abstractions, where the interesting intent is internal, an invariant the client never sees but every operation must preserve.
None of this requires a new library. Everything in this lecture is comments and ordinary OCaml. But it is the load-bearing layer under everything else in the module: the next lecture derives test cases from specifications, and the property-based testing lectures turn specification clauses into executable properties.
A contract between two parties
A specification is a contract between the client of a piece of code and its implementer. The client is whoever calls the function; the implementer is whoever wrote it. Often both are you, two weeks apart, which does not make the contract less useful: the you of today remembers what the function promises, and the you of two weeks from now will not.
The contract states two kinds of obligation. What the client must guarantee about the inputs: these are preconditions. And what the implementer must guarantee about the outputs, assuming the preconditions held: these are postconditions.
The payoff of writing the contract down is blame assignment. When something goes wrong, the contract tells you whose bug it is. If the client passed an input that violates the precondition, the client is at fault, and no change to the implementation is needed. If the precondition held and the output still violates the postcondition, the implementation is at fault. Without a contract, every bug report starts a negotiation; with one, it starts a lookup.
In OCaml, specifications conventionally live in documentation
comments, (** ... *), attached to declarations in a module
signature. The client reads the signature; the signature carries
the contract. You saw the mechanics of signatures in the modules
material; this lecture is about what the comments should say.
The returns clause
The first sentence of a good function spec says what the result is. Not how it is computed, what it is. The OCaml convention is to write the function applied to named arguments, in square brackets, and then state the result:
(** [index_of x xs] is the position of the first occurrence of
[x] in [xs], counting from zero. *)
val index_of : 'a -> 'a list -> int
This is called the returns clause, and it is a postcondition: given the inputs, it pins down the output. Notice the word "is". A returns clause is definitional: it defines the result in terms of the inputs. Compare a version you will see in real codebases:
(** Walks down the list, comparing each element against [x],
incrementing a counter, until it finds a match. *)
That is an operational description: it narrates the implementation. It tells the client nothing the implementation does not already say, it goes stale the moment the implementation changes, and it does not actually promise anything about the result. Definitional, not operational, is the single biggest upgrade you can make to your documentation habits.
A returns clause is often clearer with a concrete example attached. The examples clause costs one line and saves the client a careful read:
(** [index_of x xs] is the position of the first occurrence of
[x] in [xs], counting from zero.
Example: [index_of "b" ["a"; "b"; "c"] = 1]. *)
The requires clause
The index_of spec above has a hole, and it is exactly the kind
of hole specifications exist to close: what if x does not occur
in xs at all? The function as specified is partial: it is
only meaningful on part of its input space. Every partial
function forces the spec writer to make a decision, and there are
three honest options.
The first option is to restrict the domain. Add a requires clause, a precondition, declaring those inputs out of bounds:
(** [index_of x xs] is the position of the first occurrence of
[x] in [xs], counting from zero.
Requires: [x] occurs in [xs]. *)
Now the contract says: if you call this with an x not in xs,
anything may happen. An exception, a wrong answer, an infinite
loop; all of it is the client's fault, by contract. The
implementer gains freedom and speed (no checking); the client
gains an obligation. This is a reasonable trade exactly when the
client can establish the precondition cheaply.
Raises, or change the type
The second option is to make the function total by raising: it now has defined behaviour on every input, and the spec must say which exception and when. That is the raises clause:
(** [index_of x xs] is the position of the first occurrence of
[x] in [xs], counting from zero.
Raises: [Not_found] if [x] does not occur in [xs]. *)
Note the obligation this places on the implementer: the check
for "not found" must actually happen, in every build, because some
client may rely on catching Not_found. A raises clause is a
promise, not a warning.
The third option is the most OCaml-flavoured one, and you have been using it since the module on data types: change the type so the partiality shows up in the signature itself.
(** [index_of_opt x xs] is [Some i] where [i] is the position of
the first occurrence of [x] in [xs], or [None] if [x] does
not occur. *)
val index_of_opt : 'a -> 'a list -> int option
Now the "missing" case is not a comment; it is a constructor. The
client cannot forget it, because pattern-match exhaustiveness
will not let them. This is the standard library's own resolution:
List.find raises, List.find_opt returns an option, and the
pair of them is the raises-versus-types decision offered as a
menu. The spec comment does not disappear (you still must say
which occurrence and what counting convention), but one
clause of it has migrated into the type checker's jurisdiction.
The specification game
How do you know whether a spec you wrote is any good? CS3110 teaches a game for this, played between a specifier and a devious implementer. The implementer's goal is to satisfy the letter of the spec while being as unhelpful as possible. Every move the implementer makes exposes a hole; the specifier patches the hole; repeat. Recall the deduplication function from the previous lecture's activity, and watch the game run.
Round 1. The specifier writes:
(** [dedup xs] returns a list. *)
val dedup : 'a list -> 'a list
The devious implementer answers let dedup _ = []. It returns a
list. Contract satisfied; client betrayed.
Round 2. The specifier patches:
(** [dedup xs] returns a list containing no duplicates. *)
The implementer does not even need to change anything: []
contains no duplicates. (Vacuously, like every property of the
elements of an empty list.)
Round 3. The specifier patches again:
(** [dedup xs] returns a list containing exactly the distinct
elements of [xs], each exactly once. *)
Now [] is ruled out, as is the identity function (when xs
has duplicates). The devious implementer has one move left:
return the right elements in a strange order, say reversed, or
sorted. Is that a bug? The spec does not say. And here the game
teaches its subtlest lesson: this is a choice, not
automatically a hole. If the specifier does not care about
order, leaving it open is good practice; it keeps the spec
sufficiently general and lets the implementer pick the fastest
strategy (sorting, hashing). If the client needs first-occurrence
order, the specifier must say so:
(** [dedup xs] is the list of distinct elements of [xs], in
order of first occurrence.
Example: [dedup [2; 1; 2; 3] = [2; 1; 3]]. *)
A good specification balances two pressures: sufficiently restrictive, so useless implementations are ruled out, and sufficiently general, so useful ones are not. The game finds the first kind of failure; only thinking about real clients finds the second.
Data abstractions: where the contract gets interesting
For a single function, the contract talks about inputs and outputs, and that is the whole story. For a data abstraction, a module exporting an abstract type and operations on it, there is more to say, because the most important facts are ones the client cannot see.
Recall the shape from the modules material: a signature declares
type t without a definition, and the structure picks a concrete
representation the client never learns. Here is the running
example for the rest of this lecture, an abstraction for rational
numbers:
The obvious representation is a pair of integers, the numerator and the denominator. Here is a first implementation:
It behaves the way the contract promises:
The representation is int * int. But the moment we pick it, two
questions appear that the function-spec vocabulary cannot answer.
First: what does a concrete value mean? The pair (1, 2) is
just two integers; we intend it as the rational one-half. Is
(2, 4) also one-half? Is (3, 0) anything at all?
Second: which concrete values are legal? If (3, 0) is
nonsense (there is no rational 3/0), then every operation in the
module must avoid ever creating it, and may assume it never
receives it. That assumption is invisible in the types: the type
of the representation is int * int, and (3, 0) inhabits it
happily.
These two questions have names, and writing down their answers is the module-level analogue of writing a function spec.
The abstraction function
The abstraction function (AF) answers the first question: it
maps each concrete representation value to the abstract value it
stands for. It is written as a comment on the representation
type, because it is for the implementer and maintainer, not the
client. In Rational it reads: "(p, q) represents the rational
number p/q."
Two properties of the abstraction function deserve attention, because both have consequences you can observe from the client side.
The AF is many-to-one. The pairs (1, 2), (2, 4), and
(3, 6) all represent one-half. That is why the signature
insists on an equal operation: the built-in = on the
representation would call (1, 2) and (2, 4) different, but
as rationals they are the same value, and equal (by
cross-multiplication: p1 * q2 = p2 * q1) agrees:
The AF is partial. The pair (3, 0) maps to nothing; there is
no such rational. Which brings us to the second question.
The representation invariant
The representation invariant (RI) answers the second question:
it is the predicate that separates legal representations from
illegal ones. For Rational, the RI is one clause: q <> 0.
The RI is a contract too, but an internal one, among the
operations of the module. Every operation may assume the RI
holds of any t it receives, and must guarantee the RI
holds of any t it creates. If both halves hold for every
operation, then no client, composing operations from outside,
can ever get their hands on an illegal value. The proof is an
induction over the client's program, but the intuition is
enough: legal values in, legal values out, forever.
The assume half is what makes the RI useful: add does not
check its arguments for zero denominators, and it does not need
to, because no legal t has one. The guarantee half is what
makes the RI true: make rejects q = 0 at the boundary, and
each operation must produce only legal pairs.
This is also why the type being abstract matters so much. If
the signature exposed type t = int * int, a client could write
(3, 0) directly, no operation would ever have rejected it, and
every function in the module would now be wrong through no fault
of its own. Abstraction is what gives the module a boundary at
which the invariant can be enforced. You met this idea as
"encapsulation" in the modules material; the RI is the precise
statement of what the encapsulation is protecting.
rep_ok: the invariant as code
An invariant in a comment is a promise; an invariant in code is a
check. The convention, taken from CS3110, is a function rep_ok
that returns its argument unchanged if the RI holds and raises
otherwise. Here is Rational again with the check threaded
through every operation that creates a representation value:
The shape is worth memorising: identity function, applied at every point where a new representation value is born. It costs one line per operation and it converts a silent corruption into a loud failure at the moment of corruption.
To see what that buys, look closely at div. It computes
(p1 * q2, q1 * p2). Perfectly correct, except for one input:
dividing by the rational zero. If r2 is (0, 5), the result
denominator is q1 * 0 = 0, an illegal value. The fault is in
div (its spec said Raises: Failure if r2 is zero, and this
implementation forgot to check). Watch rep_ok catch it:
Recall the faults-versus-failures vocabulary from the previous
lecture. Without rep_ok, the fault stays silent: div returns
(2, 0), the client passes it onward, and the failure finally
surfaces three calls later, in to_string output like "2/0",
or in an equal that returns nonsense, far from the fault. With
rep_ok, the failure happens inside div, in the same stack
frame as the fault. The invariant check is a fault-localisation
device: it cannot fix the bug, but it tells you whose stack
frame it died in.
(The correct div, for the record, checks p2 = 0 and raises
Failure "div: divide by zero" as its spec promises. The point
here is what happens on the day you forget.)
Strengthening the invariant: canonical form
The RI is not handed down; the implementer chooses it, and the
choice is a real design decision with observable consequences.
Our Rational keeps the weak invariant q <> 0 and pays for it
elsewhere: equal must cross-multiply, and to_string happily
prints "2/4", which a client comparing strings might not
expect.
The alternative is to strengthen the RI to demand a canonical
form: the fraction is always in lowest terms, and the
denominator is always positive. Two helpers do the work. norm
maps any pair with a nonzero denominator to its canonical
representative; canon_ok is this representation's rep_ok,
written outside the module so it fits beside norm on one
slide:
The module itself then pipes every producer through
canon_ok (norm ...):
Look at what moved. The work migrated into norm, which every
producer now calls; in exchange, equal collapsed to the
built-in = (each abstract value now has exactly one
representation, so structural equality is abstract equality),
and to_string prints what a human expects. The AF became
one-to-one. The same external contract, the RATIONAL
signature, is satisfied by both modules; the client cannot tell
which one they are using except by performance. That is
abstraction doing its job.
Neither choice is "right". The weak RI does less work per operation; the strong RI makes observation and comparison trivial. What is not optional is writing the choice down, because every operation in the module is written against it.
Specifications are what tests check
Step back and look at what this lecture actually built, because the rest of the module stands on it. Every clause of a specification is an obligation, and every obligation is something a test can check:
- A returns clause plus an examples clause hand you concrete
test cases nearly verbatim:
index_of "b" ["a"; "b"; "c"] = 1wants to be an assertion. - A raises clause is a negative test: call it with the bad input, assert that exactly the promised exception comes out.
- A requires clause tells you which inputs not to waste tests on, and warns you that nothing is promised there.
- A representation invariant is a property every operation must
preserve;
rep_okis the checking code, and a test suite can hammer the module with operations whilerep_okstands guard.
The next lecture asks the question this one leaves open: the spec tells you what to check, but the input space is astronomically large; which inputs do you actually try? And when we reach property-based testing, the specification returns in executable form: a QCheck property is a postcondition that a machine checks on hundreds of random inputs.
Activity
A specifier writes: "[dedup xs] returns a list containing no
duplicates." Which devious implementation satisfies this
specification while being useless?
let dedup xs = xslet dedup _ = []let dedup xs = List.rev xs- None: the specification already pins down the behaviour.
Why: the empty list contains no duplicates, vacuously, so
let dedup _ = [] satisfies the letter of the spec for every
input. The identity and List.rev do not satisfy it: when
xs contains duplicates, both return a list that still has
them. The spec's hole is that it never requires the result to
contain the elements of xs; the patched version reads
"exactly the distinct elements of [xs]", and a fully pinned
version also fixes the order.
A module represents rationals as int * int with this
documented invariant: "RI: the denominator is positive and the
fraction is in lowest terms." Which concrete value satisfies
the representation invariant?
(2, 4)(1, -2)(-1, 2)(5, 0)
Why: (-1, 2) has a positive denominator and
gcd 1 2 = 1, so it is in lowest terms; the sign living in the
numerator is exactly what the canonical form prescribes.
(2, 4) is not in lowest terms (gcd 2 4 = 2); (1, -2) has
a negative denominator; (5, 0) is not a rational at all. Note
what the exercise demonstrates: checking the RI requires only
the value and the documented invariant, never the operations.
That is why rep_ok is easy to write and worth writing.
An interval type stores a record { lo : int; hi : int } with
the representation invariant lo <= hi. Write
rep_ok : interval -> interval in the CS3110 style: return the
argument unchanged when the invariant holds, raise Failure
(with any message) when it does not.
Show reference solution
Reference solution:
The boundary case lo = hi (an empty-width interval like
{ lo = 2; hi = 2 }) satisfies lo <= hi and must pass; if you
wrote < instead of <=, the second assertion catches it. The
shape is always the same: test the documented predicate, return
the value, raise on violation.
Common pitfalls
Pitfall 1: operational specs. "Walks the list comparing each element" narrates the implementation; it promises nothing and goes stale on the first rewrite. Write what the result is.
Pitfall 2: copying the signature's specs into the
implementation. The structure should not repeat the (** ... *)
comments from the signature. Two copies drift apart, and the
drifted copy is a trap for the next maintainer. The
implementation's own comments are for the AF, the RI, and
algorithm notes: the things the signature cannot say.
Pitfall 3: preconditions the client cannot check. "Requires:
xs is sorted" is fine; the client can sort or can know it
sorted. "Requires: the result will be used responsibly" is not a
precondition, it is a wish. If you cannot state the requires
clause as a predicate on the arguments, it does not belong in
the contract.
Pitfall 4: a spec that is silent on the boundary. What does
index_of do on the empty list? What does dedup do on []?
If the spec does not say, the implementer will decide by
accident, and the client will discover the decision in
production. The next lecture turns this observation into a
test-design method: the boundaries the spec mentions, and the
ones it forgot, are exactly where test cases earn their keep.
Pitfall 5: rep_ok that never comes off. A linear-time
rep_ok wrapped around every constant-time operation changes
the module's complexity class. The CS3110 convention is to keep
the expensive check during development and debugging, and swap
in a cheap or identity version for production, keeping the
expensive code in the file so it can be reinstated the day a
representation bug is suspected.
What's next
The specification says what to check. The next lecture confronts the question that makes testing a craft rather than a chore: the input space of even a tiny function is astronomically larger than any test suite, so which inputs do you choose? The answer has two halves: read the boundaries out of the specification (black-box testing), and read the paths out of the implementation (glass-box testing). The clauses you learned to write today, including the boundaries your spec mentions and the requires clauses it imposes, are exactly the raw material the choosing works from.
Reading
- Cornell CS3110, Specifications: https://cs3110.github.io/textbook/chapters/correctness/specifications.html
- Cornell CS3110, Function Documentation: https://cs3110.github.io/textbook/chapters/correctness/function_docs.html
- Cornell CS3110, Module Documentation (abstraction functions and representation invariants): https://cs3110.github.io/textbook/chapters/correctness/module_docs.html
Sources
This lecture's prose, worked examples, and quizzes are original
to this course. Cornell CS3110's chapters on specifications,
function documentation, and module documentation are the
primary conceptual sources for the clause vocabulary, the
specification game, and the AF/RI/rep_ok discipline; CS3110's
prose is CC BY-NC-ND licensed and has not been derivatively
reused. The rational-number data abstraction is a standard
teaching example; the Rational modules, the dedup
specification game instance, and the interval activity are this
course's own. See
LICENSES.md
at the repository root for the full source posture.