Unit testing

Functional Programming with OCaml

Unit testing

Module 9 · Lecture 4

KC Sivaramakrishnan
IIT Madras

A unit test checks one unit of code: a single function, or a single module, exercised in isolation against its specification. The check is small, fast, and repeatable: fixed inputs, an expected result, and a comparison between the two. Unit testing is the practice of writing many such checks and running them together. It is the most local level of testing: a unit test exercises one function on its own, as opposed to integration testing (do the functions work together?) and system testing (does the whole program meet its requirements?). Bugs are cheapest to find at the unit level, because a failing unit test points at one function, not at a whole pipeline.

Unit testing

A unit test checks one unit in isolation: a single function or module, against its specification.

The previous two lectures produced case tables: a contract to check against (specifications), and the inputs worth checking (test design). So far we have run those cases as bare asserts, which is fine for a handful on one page and unworkable for a real project: the first failed assert halts everything, tells you a line number and nothing else, and there is no way to run one suite among many.

This lecture turns a case table into something a project can live with: a suite of named unit tests, run by a runner that reports which ones failed, wired into the build so the tests run on every change. The tool we use is OUnit2; but the tool is incidental. The vocabulary (case, suite, fixture) and the ideas (independence, positive-and-negative, the build gate) transfer to Alcotest, to ppx_inline_test, to JUnit and pytest. We pick one and use it; skim the others when you meet them.

What this lecture covers

Three words

Three words recur across every testing framework, and they mean the same thing everywhere. Worth pinning down.

A test case is a single named scenario: a fixed input, an expected result, and the comparison between them. "Push three items, then peek, and see the third" is one case.

A test suite is a collection of cases run together. The suite is the unit of execution: you run a suite, and the framework reports how many passed, how many failed, and where.

A test fixture is the setup (and teardown) that runs around each case so it starts from a known state. For a stack, the fixture is "a fresh empty stack." Fixtures keep cases short and, crucially, independent.

Three words

Term Meaning
Test case One named scenario: input + expected + comparison.
Test suite A collection of cases run as one unit.
Test fixture Setup/teardown around each case (a known state).

From a case table to a suite

Here is the smallest possible suite. It is the bare assert turned into a named case, collected into a suite, handed to a runner:

open OUnit2 let test_addition _ = assert_equal 4 (2 + 2) let suite = "arithmetic" >::: [ "two plus two" >:: test_addition ] let _ = run_test_tt_main suite

Four moving parts, and they are the whole framework:

Run the cell: one dot, OK. The point of the name is that when a case fails six months from now, the report says which scenario broke, not just "an assertion on line 273."

From a case table to a suite

open OUnit2 let test_addition _ = assert_equal 4 (2 + 2) let suite = "arithmetic" >::: [ "two plus two" >:: test_addition ] let _ = run_test_tt_main suite

A note on failure messages: assert_equal has no generic way to print arbitrary OCaml values, so a bare failure says "expected ... got ..." with no values. Pass ~printer and it becomes useful: assert_equal ~printer:string_of_int 4 (2 + 2). Make it a habit.

Tests as a build gate

The real payoff is not running tests by hand; it is running them automatically, on every change. In a dune project the tests live in their own directory with a (test ...) stanza:

(test
 (name test_stack)
 (libraries ounit2 my_library))

dune runtest builds whatever is stale, runs the test executable, and shows its output. If a test fails, dune exits non-zero, and that is the whole point: a failing test breaks the build, exactly the way a type error does. Wire dune runtest into continuous integration and a broken contract can no longer reach main unnoticed. Types and tests become the same kind of gate: both run on every change, both refuse to let a known problem through.

Tests as a build gate

(test
 (name test_stack)
 (libraries ounit2 my_library))

You can try the build gate yourself. The terminal below boots a real Linux machine inside this page (nothing installed on your computer, nothing on a server), sitting in ~/morse: a small library in lib/, a binary in bin/, and an OUnit2 suite in test/. Run dune runtest; then break an expected string with nano test/test_morse.ml and run it again to watch the suite go red and the exit status flip.

Live: dune runtest

Each case from a known state

When the code under test is stateful (a mutable stack, a file, a connection), each case must start from a known state, or cases contaminate one another. The OCaml idiom is simple: a helper that builds a fresh value, called at the top of every case.

exception Empty module Stack = struct type 'a t = { mutable items : 'a list } let create () = { items = [] } let is_empty s = s.items = [] let push x s = s.items <- x :: s.items let pop s = match s.items with | [] -> raise Empty | x :: rest -> s.items <- rest; x let peek s = match s.items with | [] -> raise Empty | x :: _ -> x end

This is the Stack from the modules material, in its value-oriented shape: every operation takes a Stack.t, so each case can Stack.create () its own. (Empty-stack pop/peek raise Empty, which gives us an error path to test.)

The Stack under test

exception Empty module Stack = struct type 'a t = { mutable items : 'a list } let create () = { items = [] } let push x s = s.items <- x :: s.items let pop s = match s.items with | [] -> raise Empty | x :: r -> s.items <- r; x end

Positive and negative cases

A good suite tests both what the function should accept and what it should reject. Positive cases check normal behaviour; negative cases check the error paths, which is where most production bugs actually hide.

open OUnit2 let test_lifo _ = let s = Stack.create () in Stack.push 1 s; Stack.push 2 s; Stack.push 3 s; assert_equal ~printer:string_of_int 3 (Stack.pop s); assert_equal ~printer:string_of_int 2 (Stack.pop s); assert_equal ~printer:string_of_int 1 (Stack.pop s) let test_pop_empty_raises _ = let s = Stack.create () in assert_raises Empty (fun () -> Stack.pop s) let suite = "stack" >::: [ "LIFO order" >:: test_lifo; "pop empty raises" >:: test_pop_empty_raises; ]

Handing suite to run_test_tt_main (as in the arithmetic example above) reports both cases passing:

..
Ran: 2 tests in: 0.00 seconds.
OK

Two things to notice. test_lifo makes several assertions on one setup: that is fine, it is one behaviour (LIFO) expressed with several checks, and the report still names the line that failed. And assert_raises takes a thunk, fun () -> Stack.pop s, not Stack.pop s: the framework needs an un-evaluated computation it can wrap in its own try ... with, so it can catch the exception and compare it.

Positive and negative cases

let test_lifo _ = let s = Stack.create () in Stack.push 1 s; Stack.push 2 s; Stack.push 3 s; assert_equal ~printer:string_of_int 3 (Stack.pop s); assert_equal ~printer:string_of_int 2 (Stack.pop s) let test_pop_empty_raises _ = let s = Stack.create () in assert_raises Empty (fun () -> Stack.pop s)

Collecting them into a named suite

let suite = "stack" >::: [ "LIFO order" >:: test_lifo; "pop empty raises" >:: test_pop_empty_raises; ]

When a function breaks, the report points at it

Suppose someone "refactors" push into a no-op:

let push _ _ = ()        (* DELIBERATELY BROKEN *)

Re-run the two-case suite. The vanished pushes leave the stack empty, so LIFO order's first pop now raises Empty; the pop empty raises case never pushes, so it is unaffected:

E.
==============================================================================
Error: stack:0:LIFO order.
  ...
  Empty
------------------------------------------------------------------------------
Ran: 2 tests in: 0.00 seconds.
FAILED: Cases: 2  Tried: 2  Errors: 1  Failures: 0  Skip: 0  Todo: 0

E. reads left to right: the first case errored, the second (.) passed. OUnit2 marks an unexpected exception E and a failed comparison F; either way the case is red, and the report names it (stack:0:LIFO order). The green case tells you the break is confined to the push path. That localisation is the runner earning its keep, and the reason a named suite beats a wall of asserts, where the first failure halts the rest.

When a function breaks

let push _ _ = ()        (* DELIBERATELY BROKEN *)

E.
Error: stack:0:LIFO order.   (* raised Empty *)
Ran: 2 tests in: 0.00 seconds.
FAILED: Cases: 2  Errors: 1  Failures: 0

What a runner does, and does not, do

OUnit2 (and its peers) runs named cases and reports failures. That is the whole job. Three things it deliberately does not do, each handled elsewhere in this module:

One sibling of the unit test is worth naming: the expect test. Instead of writing the expected value by hand, you run the case once, let the framework capture the actual output, read it, and keep the captured version; from then on, the case fails whenever the output changes. It is a unit test whose expected side is recorded rather than written, and it shines where the output is messy or evolving (pretty-printers, error reports, end-to-end transcripts). Real World OCaml's testing chapter shows the workflow with ppx_expect.

A runner runs cases; it does not...

A clean tool that does one thing: run cases, report failures.

Activity

Every case in the Stack suite begins with let s = Stack.create () instead of sharing one stack across all cases. What is the main reason?

Why: a single shared mutable stack makes case n depend on everything cases 1..n-1 did to it. A failure then cascades into unrelated cases, and the result depends on run order, so a green suite stops meaning "each behaviour works." A fresh fixture per case isolates each one to the single thing it checks. OUnit2 does not forbid sharing (nothing stops you doing the wrong thing), and the speed difference is negligible; the reason is independence.

Write an OUnit2 test case named "push then pop on a fresh stack returns the pushed value". Use the value-oriented Stack from this lecture. The case should create a fresh stack, push 7, pop, and assert the result is 7 (with a printer).

let test_push_pop_seven _ = failwith "not implemented"
Show reference solution

Reference solution:

let test_push_pop_seven _ = let s = Stack.create () in Stack.push 7 s; assert_equal ~printer:string_of_int 7 (Stack.pop s)

Arrange (create + push), act (the pop inside the assertion), assert (with a printer so a failure prints the numbers). That is the canonical shape of a case.

What's next

Lecture 5 takes the step this lecture deliberately did not: instead of hand-writing each input and its answer, generate inputs and check that a property holds over all of them. That is property-based testing with QCheck: you will see why functional code makes properties natural to state, and watch the shrinker cut a failing input down to its smallest form.

Lecture 6 tests stateful code against a reference; Lecture 7 puts the whole toolkit on one worked example.

What's next

Reading

Sources

This lecture's prose, worked examples, and quizzes are original to this course. Cornell CS3110's OUnit chapter is the conceptual antecedent for the case/suite/runner vocabulary; its prose is CC BY-NC-ND licensed and has not been derivatively reused. The Stack is the module-basics module in a value-oriented shape so cases stay independent. OUnit2 (MIT) is used through its public API. See LICENSES.md at the repository root for the full source posture.