GADTs: variants with type-level information

Functional Programming with OCaml

GADTs: variants with type-level information

Module 8 · Lecture 4

KC Sivaramakrishnan
IIT Madras

This lecture switches gears entirely from the monad story. So far in Module 8 we have been about sequencing computations; now we turn to a more advanced type-system feature called generalized algebraic data types, almost always abbreviated GADTs. They are the second half of the OCaml toolkit needed for typed embedded languages, and they show up in serious OCaml code whenever you want the compiler to do more work for you.

Ordinary variants say "this value is one of a finite set of cases". GADTs add: "and each case can have a different type index". The practical consequence is that the compiler can prove things at compile time that an ordinary variant would have to check at runtime. Wrong combinations become type errors, not crashes.

The idea is in some ways simple; the notation is unusual; the type theory is involved. We keep it light and lean on worked examples.

This lecture: GADTs

Ordinary variant: same parameter for all constructors

To set the contrast, an ordinary parameterised variant:

Ordinary variant: same parameter for all constructors

type 'a tree = | Leaf | Node of 'a tree * 'a * 'a tree

Here 'a tree is uniformly indexed: every Leaf and every Node inside an 'a tree value uses the same 'a. If 'a = int, then the Node values carry ints; if 'a = string, they carry strings. The constructors do not choose their own type index; they share whatever the surrounding type says.

That is fine for most data structures. A list of ints, a tree of strings, a record of options: in all these the parameter is fixed by the outside. But sometimes the constructors need to choose their own indices independently. That is the case GADTs handle.

The GADT form

Look at this expression-AST type:

GADT: constructors with specific indices

type _ expr = | Int_lit : int -> int expr | Bool_lit : bool -> bool expr | Add : int expr * int expr -> int expr | If : bool expr * 'a expr * 'a expr -> 'a expr

Reading the constructors:

The _ in type _ expr is a placeholder; each constructor decides what fills it in.

The syntax Constructor : args -> result_type is unusual. Read it as an explicit type signature for the constructor: like a function signature, but for a data constructor. Ordinary variant constructors implicitly have a result type of "the type being defined, with the same parameters as the type header". GADT constructors say so explicitly, which lets them choose different result types.

Concretely:

This is what we mean by "type-level information". The constructor not only tags the data; it pins down the type of the value, and the compiler uses that information at compile time.

What we get: type-safe construction

Compare what the compiler accepts and rejects:

What we get: type-safe construction

type _ expr = | Int_lit : int -> int expr | Bool_lit : bool -> bool expr | Add : int expr * int expr -> int expr | If : bool expr * 'a expr * 'a expr -> 'a expr let e1 : int expr = Add (Int_lit 1, Int_lit 2) let e2 : int expr = If (Bool_lit true, Int_lit 5, Int_lit 10)

Both compile. Add is given two int exprs; If has a bool expr condition and matching branches.

The broken versions are caught at compile time

let bad = Add (Int_lit 1, Bool_lit true) let worse = If (Int_lit 5, Int_lit 1, Int_lit 2)

This is the GADT design payoff: programs that would otherwise crash an interpreter become type errors. With ordinary variants you would have to write an interpreter, walk the AST, and check at every node that the children have compatible types. With a GADT, the compiler refuses to even build a tree with incompatible children. The runtime check vanishes; the bad construction is rejected before any code runs.

The cost is on the construction side: every constructor application has to be at the right index. The compiler is unforgiving about this. If you want to write a program that builds an int expr, every step of the construction has to commit to int-typed values. You cannot have a runtime-dispatched "I will figure out later what type this is" because the compiler wants to know now.

Pattern matching with type refinement

A breath before the next bit. So far you have seen GADT syntax, the _ placeholder, and what type-safe construction buys. The section below is the centerpiece of the lecture: how the compiler uses the constructor's result type to refine the type during pattern matching, and the type a. ... annotation that turns this on. Read the eval walkthrough that follows slowly; the four cases each illustrate a different facet of the refinement.

First, see why a plain annotation will not do. The obvious eval : 'a expr -> 'a does not type-check:

First: why the naive version fails

let rec eval : 'a expr -> 'a = function | Int_lit n -> n (* this branch needs 'a = int *) | Bool_lit b -> b (* this branch needs 'a = bool *) (* ... and the other constructors ... *)

The fix is the type a. annotation: it lets a be a different type in each branch. With it, OCaml's pattern matching refines the type index per case: each branch knows which constructor fired and specialises a accordingly.

Pattern matching: type refinement

let rec eval : type a. a expr -> a = function | Int_lit n -> n | Bool_lit b -> b | Add (a, b) -> eval a + eval b | If (c, t, e) -> if eval c then eval t else eval e let _ = eval (Add (Int_lit 3, Int_lit 4)) (* = 7 *) let _ = eval (If (Bool_lit true, Int_lit 5, Int_lit 10)) (* = 5 *)

Two new things:

The type a. ... syntax says "this function is polymorphic in a, and inside the body a is an abstract type." Why is it needed here, when most OCaml functions need no annotation at all? Because eval does polymorphic recursion: it calls itself at a different type than the one it was called at. In the If case, the condition c is a bool expr even when the whole If is an int expr, so eval c recurses at bool while the outer call is at int. OCaml's automatic inference assumes a recursive function calls itself at the same type (that assumption is what keeps inference decidable), so it cannot work this case out on its own. The type a. annotation supplies the type up front, and that is exactly what lets each branch refine a independently. This is one of the few places in OCaml where an annotation is not optional.

Reading the cases:

The compiler is doing real work in every case: tracking which constructor matched, refining the abstract type accordingly, and type-checking the right-hand side under the refinement. This is the central feature that distinguishes GADTs from ordinary variants. With ordinary variants the index is fixed across branches; with GADTs it can change.

Why is this useful?

Why is this useful?

A more concrete way to think about the first benefit: when you build a program with Add (Bool_lit true, Int_lit 5), an ordinary AST would represent the program just fine, and the interpreter would eventually try to add a boolean to an integer and raise a runtime "type error". The compiler had no way to notice the mistake when you wrote the source. With a GADT, the compiler refuses to compile the source: the construction itself is ill-typed, before any evaluation runs.

For a toy interpreter that is a parlour trick. For a real language (a SQL query builder, a financial calculation engine, an embedded scripting language), it is a massive win: bugs that would otherwise hide in branches your tests never exercise become impossible to write.

The cost is real too. GADTs require more annotations, more locally-abstract-type declarations, and more thought. Pattern matching can occasionally need explicit type assertions to help the compiler refine correctly. The error messages, when GADT inference fails, are harder to read than ordinary type errors. Most OCaml code does not need GADTs and is better off without them; the code that does need them needs them in a serious way.

Units of measure

A units bug that cost $125 million

The Orbiter was not even the first famous unit victim. In 1983, Air Canada flight 143 ran out of fuel at 41,000 feet because the ground crew, mid-way through Canada's imperial-to-metric transition, computed the fuel load in pounds where the new 767 expected kilograms; the aircraft, suddenly a 132-seat glider, dead-sticked onto a drag strip at Gimli with no fatalities (the Gimli Glider). Sixteen years apart, the same bug: a number travelled without its unit.

That Mars Climate Orbiter loss is the motivating example: types can make a units mix-up impossible. We tag a float with its unit by declaring three uninhabited types, kelvin, celsius, and fahrenheit, that have no values of their own and exist only to sit in a type index. Then a GADT temp whose constructor picks the unit:

Units of measure: a typed temperature

type kelvin type celsius type fahrenheit type _ temp = | Kelvin : float -> kelvin temp | Celsius : float -> celsius temp | Fahrenheit : float -> fahrenheit temp

Now write addition so it can only ever combine temperatures of the same unit:

Adding temperatures, units enforced

let add_temp : type u. u temp -> u temp -> u temp = fun a b -> match a, b with | Kelvin x, Kelvin y -> Kelvin (x +. y) | Celsius x, Celsius y -> Celsius (x +. y) | Fahrenheit x, Fahrenheit y -> Fahrenheit (x +. y) let _ = add_temp (Kelvin 20.) (Kelvin 30.) (* = Kelvin 50. *) let _ = add_temp (Kelvin 20.) (Celsius 12.) (* error: celsius temp is not compatible with kelvin temp *)

This is a phantom type encoded with a GADT: the unit lives only in the type index, never in the runtime value, yet it is enough for the compiler to keep the units apart. Real OCaml code uses exactly this pattern for units, currencies, and tagged identifiers.

When to reach for GADTs

When to reach for GADTs

Use a GADT when:

Avoid them when:

Rare in everyday code; heavy use in interpreters, query builders, library cores.

A useful question to ask before reaching for a GADT: "what specifically would go wrong if I used an ordinary variant and runtime checks?" If the answer is "I would crash with a type-mismatch error after running for hours", GADTs are worth it. If the answer is "I would have to add an option return type and pattern-match in two places", they are probably not.

The next lecture shows real use cases that pull this into focus, typed pretty-printers, type-safe builders, and length-indexed lists, and the lecture after adds heterogeneous lists and the GADT machinery behind Printf.

A quick check

What is the type of Add (Int_lit 1, Int_lit 2)?

Why: the constructor's signature is Add : int expr * int expr -> int expr. Given two int expr arguments, the result type is int expr. The compiler refuses to apply Add to anything else.

Why does the eval function need the annotation type a. a expr -> a?

Why: without the locally-abstract-type annotation, the compiler tries to infer a single concrete type for a and fails because different branches refine it differently. The type a. ... annotation says "treat a as fresh and abstract", which is what GADT pattern matching needs. This is a recurring quirk: GADT functions often need this annotation explicitly.

Activity

The lecture's expr has Int_lit, Bool_lit, Add, If. Add a constructor Is_zero : int expr -> bool expr (true when its int expr evaluates to 0) and extend eval to handle it. It takes an int expr but produces a bool expr.

Show reference solution

Activity solution: the type

type _ expr = | Int_lit : int -> int expr | Bool_lit : bool -> bool expr | Add : int expr * int expr -> int expr | If : bool expr * 'a expr * 'a expr -> 'a expr | Is_zero : int expr -> bool expr
  • Is_zero consumes an int expr but its result type is bool expr, so it can sit in If's condition slot.

Activity solution: eval

let rec eval : type a. a expr -> a = function | Int_lit n -> n | Bool_lit b -> b | Add (x, y) -> eval x + eval y | If (c, t, e) -> if eval c then eval t else eval e | Is_zero e -> eval e = 0 let _ = eval (If (Is_zero (Add (Int_lit 2, Int_lit (-2))), Int_lit 1, Int_lit 0)) (* = 1 *)
  • The Is_zero case computes the inner int, then compares to 0.

A code quiz:

Define a GADT type _ value with two constructors VInt : int -> int value and VBool : bool -> bool value. Write unwrap : type a. a value -> a that returns the underlying value.

type _ value = | VInt : int -> int value | VBool : bool -> bool value let unwrap : type a. a value -> a = fun _ -> failwith "not implemented"
Show reference solution

Reference solution:

let unwrap : type a. a value -> a = function
  | VInt n -> n
  | VBool b -> b

The type a. ... annotation makes a locally abstract so that each branch can refine it (a = int for VInt, a = bool for VBool). Without the annotation, the compiler cannot find a single concrete a to satisfy both branches.

What is next

What is next

Lecture 5: GADT use cases.

The next lecture takes the basic machinery here and shows real applications. The tutorial later combines a GADT-based typed AST with an option-monad evaluator that can fail at runtime (division by zero, say) while still guaranteeing type safety on the success path. That is the capstone for the OCaml half of the course.

Reading

Sources

This lecture's prose, worked examples, and quizzes are original to this course. Materials referenced during preparation are listed in the Reading section above; Cornell CS3110 and Real World OCaml are CC BY-NC-ND-licensed and have not been derivatively reused. See LICENSES.md at the repository root for the full source posture.