Variants (sum types)
Tuples and
records express and: a 2D point has an
x and a y; a person has a name and an age and an email.
Variants express or: a shape is a circle or a square or a
rectangle; the result of parsing is a success or a failure; a
value in an arithmetic expression is a number or a sum or a
product. Almost every interesting data type you will design in
OCaml is a combination of these two: variants for the "kinds,"
records (or tuples) for the data inside each kind.
The name variant refers to the fact that a value of a variant type varies between several possibilities. The names sum type and disjoint union refer to the underlying set theory: the set of values of a variant is the disjoint union of the sets of values of its alternatives. The two names you will hear most often in practice are variant (OCaml's term) and tagged union (the implementation view: each value carries a tag identifying which alternative it is, plus the payload data for that alternative). Algebraic data type is the umbrella term that combines variants and records.
If you have used C, the closest analogue is enum plus a tagged
union plus the programmer discipline to keep the tag and the
union consistent. OCaml folds all three into a single declaration
and asks the compiler to enforce the consistency. If you have used
Rust, the same idea appears as enum; if you have used Java with
sealed interfaces or Kotlin with sealed classes, the same idea
appears there too. Each of these is following the lead of the
ML family, where variants have been around since the 1970s.
Type aliases
Before getting to variants proper, one bit of supporting syntax. OCaml lets you give a short name to an existing type. This is called a type alias (or type abbreviation):
After these declarations, point and float * float are the
same type; the compiler treats them as interchangeable. The name
exists purely for readability. points is point list, which is
(float * float) list. Three names for the same type, depending
on what you want to emphasise at each call site.
Aliases are useful when the underlying type is verbose, when the
same compound type shows up in many signatures, or when the name
carries information beyond the structure. They are not useful
when you want type safety between two structurally-identical
concepts. type ms = int and type fps = int are both int to
the compiler; you can freely substitute one for the other. For
real type safety here, you need a record
or a single-constructor variant (see below).
Declaring a variant: the enum case
The simplest variant is one whose alternatives carry no data:
Four constructors (North, South, East, West), separated
by |. A value of type direction is exactly one of these four.
The capitalisation is mandatory: OCaml uses the first character of an identifier to distinguish a constructor (starts with capital, denotes a variant case) from a variable (starts with lowercase, denotes a binding). The compiler tells the two apart by lexical rule alone, which saves you from ambiguity later when we get to patterns.
A value of type direction is just one of the four named tags.
There is no implicit numeric encoding (C lets you write North = 0
and treat directions as ints; OCaml does not). If you want a
mapping to ints, write a function:
This is more verbose than C's implicit enumeration, but it is
explicit: the mapping is in one place. (We will see in
Module 5 how match makes this
cascade much tidier, and how the compiler can warn you if a case
is missing.)
Combining variants and records: a card
The domain often has more than one axis of choice. A playing card has a suit and a rank, each independent of the other. Each axis is a small enum variant; the card itself is a record that bundles both:
Two enum-case variants and one record that combines them. The
variants capture "exactly one of these alternatives"; the record
captures "all of these together". A card is therefore "this
suit and this rank", drawn from the 4 x 13 = 52 combinations.
This is the everyday shape of Module 4 data: variants for the axes of choice, records for the bundling. Recursion in the variants (lists, trees, expressions) is the topic of the next lecture.
Constructors with payload
A variant becomes much more interesting when its constructors carry data. This is the syntax for attaching data to a constructor:
Each of TYPE clause specifies what data the constructor carries.
Circle of float says "a Circle carries one float (its
radius)"; Rectangle of float * float says "a Rectangle carries
two floats (its width and height)."
A constructor is applied to its payload by juxtaposition (like a
function call without parentheses): Circle 3.0, Square 5.0. For
constructors that take multiple components, the payload is a tuple,
wrapped in parens: Rectangle (4.0, 6.0). The parentheses around
the tuple are required when the tuple has more than one component
and the constructor takes a tuple payload.
A subtle but important point: Rectangle does not take two
arguments. It takes one argument, which happens to be a pair. The
type float * float in of float * float is a single tuple type,
not "two arguments." This matters for pattern matching: you write
Rectangle (w, h) (one pair-pattern, parens required), not
Rectangle w h (which would not parse).
Using a variant: forward-pointer to pattern matching
To use a variant value, we need a language feature that inspects a value and dispatches on which constructor was used, binding the payload along the way. That feature is pattern matching, the subject of Module 5. Pattern matching also brings two compiler-checked guarantees you will lean on heavily:
- Exhaustiveness: the compiler tracks every constructor of the variant and warns if your code forgets a case.
- Refactor-with-the-compiler: when you add a new constructor, the compiler flags every site that pattern-matches on the type, giving you a punch list of places to update.
The combination of variants + pattern matching is the engine of nearly every interesting OCaml program: interpreters, parsers, type checkers, network protocol decoders, configuration loaders, web routers. We will spend Module 5 on it.
Constructors with multi-field payloads: inline records
When a constructor's payload has more than one or two components,
the tuple form (Constructor of t1 * t2 * t3) becomes unwieldy.
For these cases, OCaml supports inline records as constructor
payloads:
Each non-trivial state carries the data relevant to that state, and each piece of data has a name. Constructing values works the same as before, but with the field-name syntax:
Extracting the data uses pattern matching with the inline-record syntax on the pattern side, which we will see in Module 5.
This is OCaml's version of Rust's struct-style enums:
enum TcpState {
Listening,
Connected { peer: String, bytes_sent: usize },
}
The inline record syntax was added to OCaml in 4.03 (2016) and has become idiomatic for variant payloads of more than two pieces of data.
Variants are not a corner feature you reach for occasionally.
They are pervasive in OCaml: bool, list, option, and
result are all variants under the hood. We will see the
recursive ones (list, trees, expressions) in the
next lecture, where they are also
the natural setting to introduce parameterised variants (lists
of anything, trees of anything) and polymorphism.
A small check
Given:
Which of these are valid constructor applications?
Circle 3.0Square "5"(wrong payload type)Rectangle (4.0, 6.0)Triangle 5.0(not a constructor ofshape)
Why: each constructor is applied to a payload that matches its
declared type. Circle 3.0 and Rectangle (4.0, 6.0) are
well-typed values of type shape. Square "5" has a string
payload where the type says float; the compiler rejects it.
Triangle is not a constructor of shape, so the compiler reports
Unbound constructor Triangle.
Design a variant http_response for HTTP responses. Cover three
shapes:
- A
Successcarrying the responsebody : string. - A
Redirectcarrying the targeturl : string. - An
Errorcarrying acode : intand amessage : string.
Then construct one example value of each constructor.
Show reference solution
Reference solution:
The Error case uses an inline record because two named fields
read more cleanly than Error of int * string.
Activity
Show reference solution
Notice how the lecture has so far only declared variant types and constructed values of them. We have not written a single function that takes a variant apart. That deconstruction step is exactly what pattern matching gives us, and we will spend the whole of Module 5 on it.
What's next
We have seen variants with fixed payloads: a Circle always
carries one float. The
next lecture lets a constructor's
payload include a value of the type being defined. That is the
doorway to lists, trees, and arbitrary tree-shaped data.
Reading
- Cornell CS3110, Variants: https://cs3110.github.io/textbook/chapters/data/variants.html
- Cornell CS3110, Algebraic data types: https://cs3110.github.io/textbook/chapters/data/algebraic_data_types.html
- Real World OCaml, Variants: https://dev.realworldocaml.org/variants.html
Sources
This lecture's prose, worked examples, and quizzes are original to
this course. Materials referenced during preparation are listed in
the Reading section above; Cornell CS3110 and Real World OCaml
are CC BY-NC-ND-licensed and have not been derivatively reused.
See LICENSES.md
at the repository root for the full source posture.