GADTs: hlists and witnesses
The GADT basics lecture showed GADTs
with a single type index (the int expr / bool expr AST).
The use-cases lecture built a typed
pretty-printer, show : 'a ty -> 'a -> string, that read a witness
(T_int, T_string, ...) to decide how to render one value. This
lecture lifts that from a single value to a whole heterogeneous
list (an "hlist"): a list whose elements may have different types,
all tracked at the type level. We pair the hlist with a list of
witnesses, one per element, and reuse show on each. The result is
a clean answer to "how do I write a generic operation across a
fixed-but-mixed collection of typed values?"
This pattern is the engine behind OCaml's Format module: every
%d, %s, %b in a format string is a witness, and printf
folds across them. We will sketch the connection at the end.
Why we need hlists
An ordinary 'a list requires every element to share 'a. A
tuple like (int, string, bool) works for a fixed shape but does
not let you recurse over its elements; there is no "fold over a
tuple". When you need a fixed-shape but mixed-typed collection
that you can also recurse over, the OCaml answer is an hlist.
The encoding uses GADTs (so each HCons cell can carry its own
type for the head while preserving the rest in the tail's type)
plus a nested-pair convention in the type parameter to record the
sequence of element types.
Defining the hlist GADT
The two constructors are HNil (the empty hlist) and HCons
(prepend an element of any type to an existing hlist):
Read the index type from the outside in: the outermost component
of the tuple is the first element's type; the rest is the index
of the tail. HNil's index is unit, which is the "we are done"
marker. The encoding lets the compiler keep track of the entire
sequence of element types.
HCons (42, HCons ("hi", HCons (true, HNil))) builds the example
above. Its type is (int * (string * (bool * unit))) hlist, which
spells out the sequence: int, string, bool, end.
Pattern-matching an hlist
A function that pulls the head off an hlist must use type a r.
to allow refinement, just like every GADT pattern match:
The function's type (a * r) hlist -> a rules out applying
hhead to HNil, because HNil : unit hlist and unit does
not match the shape a * r. The compiler verifies this at the
call site: you cannot accidentally take the head of an empty hlist.
A witness list for the hlist
To print an element we need to know its type at runtime, and that
is exactly the witness ty from
the previous lecture's pretty-printer. We reuse
it unchanged. The one new idea: pair a whole list of those
witnesses with the hlist, indexed by the same type, so each
witness sits opposite the element it describes.
That shared index is the whole trick: an hlist of shape
(int * (string * (bool * unit))) demands a tylist of the same
shape, so the two recurse together with a witness always opposite
its value.
Generic pretty-printer: pp_hlist
Now the payoff: one recursive function walks both structures in lock-step, using each witness to render its element.
Read what just happened. We wrote one recursive function that
walks the hlist and the parallel witness list, dispatching on
the witness to choose how to render each element. We could write
fold_hlist of the same shape, or map_hlist, or whatever pattern
fits the problem.
The compiler refines ix (the shared index) at each TyCons / HCons step. In the body, t : 'a ty and v : 'a for the same
'a at this position. The compiler keeps the witness and the
value aligned every step of the recursion.
What if the witness disagrees with the value?
This is where the type system saves us. Try mismatching the witnesses and the values:
The signature pp_hlist : 'ix tylist -> 'ix hlist -> string list
shares one type variable. The compiler unifies the witness list's
index with the hlist's index; if they disagree, the call does not
compile. We get type-safe generic programming for free.
This is the design strength of GADTs: a single shared type parameter across two structures pins down a constraint that would otherwise be a runtime "I hope the types match" assertion.
Closing aside: Printf.printf does the same thing
OCaml's Printf.printf (and its Format cousin) work by exactly
this mechanism, but the witness list is encoded inside the format
string. When you write Printf.printf "%d %s %b\n", the compiler
parses the format string into a witness list of essentially the
shape TyCons (T_int, TyCons (T_string, TyCons (T_bool, TyNil)))
and uses it to insist on int -> string -> bool -> unit as the
type of the remaining call.
The technical details of OCaml's format GADT are involved (the
encoding is more elaborate to handle alignment, padding, and other
format specifiers). The principle is the one we just demonstrated:
a list of witnesses, walked in parallel with a list of values, with
the compiler enforcing alignment.
This is what we mean when we say GADTs are everywhere in OCaml,
quietly. Every time you call printf you are touching the
infrastructure; you do not have to build it yourself.
When not to reach for hlists
The general when-to-reach-for-GADTs
guidance
applies; hlists add one more choice to make. A plain list, a
record, or a tuple is usually what you want:
Hlists are a power tool. Reach for them when you have a real problem (database row types, format strings, typed-DSL argument lists) and not just to be clever.
A quick check
Why does pp_hlist : 'ix tylist -> 'ix hlist -> string list use
the same type variable 'ix for both arguments?
- OCaml requires identical type variables for argument pairs.
- The shared
'ixforces the witness list and the hlist to describe the same sequence of element types; a mismatch is a compile error. - It is purely cosmetic.
- It makes the function faster.
Why: the whole point of using witnesses is that the compiler
can pair up the witness's claim ("this slot is an int") with the
value's actual type. Sharing 'ix is what wires that together. If
the two parameters used different type variables, the compiler
would accept mismatched calls, defeating the purpose.
What is the index type of HCons (1, HCons (true, HNil))?
int * bool(int * bool) hlistint * (bool * unit)unit * (bool * (int * unit))
Why: the type index of an hlist is a right-nested pair, with
the head element's type at the outermost position. Two elements,
int then bool, give int * (bool * unit). The unit at the end
is the HNil-marker. Reading from left to right gives the order
the elements were prepended.
Activity
Show reference solution
A code quiz to consolidate the witness-driven pattern:
Write sum_int_hlist : (int * (int * unit)) hlist -> int that
sums a two-element hlist of ints. (No witnesses needed: the type
already promises both elements are int.)
Show reference solution
Reference solution:
let sum_int_hlist (HCons (x, HCons (y, HNil))) = x + y
The single pattern destructures the fixed-shape hlist down to its
two int elements. The type signature is enough to convince
OCaml that the pattern is exhaustive (no other shape can have
index int * (int * unit)). No type a. ... annotation needed
because the shape is fully concrete.
What is next
The closing tutorial brings monads
and GADTs together one more time. We build a small typed AST
(GADTs), evaluate it with an option monad (let*), and close by
showing how the GADT version rejects the
Module 5 tutorial's bad1 example
(Add (Bool true, _)) at compile time, sealing the forward
pointer that opened Module 8.
Reading
- Real World OCaml, More GADTs: https://dev.realworldocaml.org/gadts.html
Sources
This lecture's prose, worked examples, and quizzes are original to
this course. The hlist encoding and witness-driven recursion draw
on the author's CS3100 GADTs notebook, used here as a private
structural reference; the surface code, comments, and explanations
are written from scratch. Real World OCaml is CC
BY-NC-ND-licensed and has not been derivatively reused. See
LICENSES.md
at the repository root for the full source posture.