Basic patterns
You have already met pattern matching in small doses: match ... with
appeared in Module 3 when we wrote recursive
list functions, and let (x, y) = ... appeared in
Module 4 when we took tuples apart. From this
lecture on, pattern matching moves to the centre of the language. It
is, more than any single other feature, what makes OCaml feel like
OCaml. You will reach for it dozens of times a day: to take apart a
tuple, to dispatch on a constructor,
to walk a tree, to handle an
option, to write the body of
nearly every interesting function.
The shape we start with is the match expression, the way you ask
"what shape is this value, and what should I do for each shape?"
In a curly-brace language you would reach for switch, or a
sequence of if/else if, or a dispatch table. OCaml's match
is the moral equivalent of all three, with one important upgrade:
the cases on the left can be structured patterns, not just
constants. The compiler will check that the patterns cover every
possibility, and warn when they do not. This lecture establishes
the basic shape and the three simplest pattern forms:
- Literal patterns match a specific value (
0,'a',"hello",true). - Variable patterns match anything and give the matched value a name.
- The wildcard
_matches anything and binds no name.
The next five lectures build on this base.
Lecture 2 pairs these patterns
with the recursive types from Module 4 (lists, trees) and gives us the
canonical shape of every list and tree function.
Lecture 3 covers patterns
inside patterns, record-pattern shorthands, the diagonal idiom,
inline records inside variants, and or-patterns.
Lecture 4 covers when-guards.
Lecture 5 is about exhaustiveness,
the most load-bearing static check in the language.
Lecture 6 is the tutorial: an
interpreter for the OCaml AST.
The shape of a match
Here is the smallest interesting example. Given an integer, we
want to label 0, 1, and 2 by name, and call everything else
"many":
Two important things to notice. First: a match is an
expression, not a statement. The whole thing has a value. That
value is the right-hand side of whichever clause matched. So you
can write let label = match n with ..., or pass a match as a
function argument, or use it as the body of another expression.
There is no separate "switch statement" in OCaml because there is
no statement/expression divide to begin with: everything is an
expression, and match is one of the most useful ones.
Second: the first pattern that matches wins. The clauses are tried
in order, top to bottom. As soon as one matches, its right-hand
side runs and the others are skipped. This is the same dispatch
rule as a chain of if/else if, and the same rule as a
fall-through-free C switch. We will return to this rule several
times in the lecture, because it interacts in interesting ways
with the variable pattern.
The leading | before the first clause is optional. Many
codebases include it for visual alignment with the rest. Some
omit it on the first line. Both are common; pick a style and be
consistent. Throughout this course we include the leading bar.
Three kinds of pattern this week
The patterns you can write on the left of a clause form a small language of their own. By the end of Module 5 we will have seen most of it. For this first lecture we restrict ourselves to three forms:
A literal pattern is a value spelled exactly as it appears.
0 matches the integer 0. 'a' matches the character 'a'.
"hello" matches the string "hello". true matches the boolean
true. The check is the same structural equality (=) we saw in
the operators lecture. Two
strings with the same bytes match; two records with the same fields
match. The literal pattern is the workhorse for "is this the special
case I want to handle?"
A variable pattern is a lowercase identifier. It matches
anything at all, and inside the right-hand side, that
identifier is bound to whatever the value was. So in the example
above, the pattern n matches 42 (because it matches
everything), and inside the body string_of_int n produces
"42". A variable pattern is how you say "I want to handle the
remaining cases uniformly, and I need a name for the value."
The wildcard _ is a special pattern that matches anything,
just like a variable pattern, but binds no name. The difference is
purely about whether you intend to use the value. Two reasons to
reach for _:
- You want a catch-all clause and do not need to refer to the
matched value.
| _ -> "default"is the standard way to write this. - You want to destructure a value but ignore some piece of it.
| (x, _) -> xextracts the first component of a pair without giving the second a name.
Development tooling warns about unused pattern variables: dune's
default build profile enables warning 27, so if you write
(x, y) -> x and never use y, the build will tell you. (The
bare toplevel, including the cells on this page, leaves that
warning off, so there it stays silent.) The wildcard is the way
to say "I am ignoring this on purpose."
Why pattern order matters
Now for the trap that catches almost every student at least once.
Consider this version of classify:
What does classify 0 return? Not "this never fires". The
answer is "variable: 0". The code compiles (the compiler warns
about the second clause), runs, and returns the wrong-looking
answer, exactly as advertised.
The variable pattern x matches everything, including 0. The
clauses are tried top to bottom; x succeeds on the very first
try; the right-hand side runs; the second clause is never
visited. OCaml's compiler will warn you about this:
Warning 11 [redundant-case]: this match case is unused.
Warning 11 is the dual of the exhaustiveness warning (warning 8, which we will meet in Lecture 5): it says you have a clause that never fires, usually because an earlier clause already covers it. When you see warning 11, you have almost certainly put a variable pattern (or a wildcard) before something more specific, and that more specific clause is dead.
The discipline is simple: specific patterns first, general patterns last. Put your literal cases before any variable or wildcard catch-all. The example becomes:
Now classify 0 is "zero" and classify 7 is "non-zero: 7",
which is what we wanted.
This rule looks pedantic until you start writing pattern matches on data types with several constructors. The compiler will not guess what you meant; it will faithfully apply the order you wrote and the warning is your only defence.
_ versus a variable name
There is a subtle but real difference between _ and a fresh
variable name like _unused, even though both match anything.
The wildcard _ cannot be referenced on the right-hand side; it
binds nothing. A variable name (even one starting with an
underscore, like _x) is a binding, and you can use it in the
body. The convention is: use _ when you do not need the value;
use a name that starts with _ when you want to document what
the ignored piece is, but you still do not intend to use it.
first_only (10, 20) returns 10. The wildcard in the second
position says "there is something here, I do not care what."
Writing (x, y) -> x also compiles, and runs silently in the
toplevel (warning 27 is off by default there, so the cell below
prints nothing unusual):
Build the same code under dune's default profile, which enables the warning, and the compiler tells you:
Warning 27 [unused-var-strict]: unused variable y.
The wildcard is the way to say "I am ignoring this on purpose; please do not warn me."
Both warnings (11 and 27) are part of the same general philosophy: the compiler does its best to flag patterns that look like they were written by mistake. Most of the time the warning is right.
The catch-all wildcard
The wildcard is also the standard way to write a default clause:
Four specific clauses, one catch-all. This is the everyday shape for "I have a finite list of known inputs and want to fall back on a default."
A small caution about catch-all wildcards on variant types:
they suppress exhaustiveness checking. If you have a variant
type with five constructors and you write four specific cases
plus a wildcard, and then later add a sixth constructor, the
compiler will not warn you, because the wildcard "covers" the
new case (probably with the wrong answer). We will return to
this in Lecture 5.
For now, use the wildcard freely on ints and strings, where
there is no other way to enumerate the cases; use it more
cautiously on variants.
function shorthand
When a one-argument function's whole body is a match on the
argument, OCaml lets you skip the boilerplate. Instead of:
you can write:
The function keyword stands for "take one argument and
immediately match on it." It is shorter and reads more cleanly
when the function is essentially a multi-way dispatch on its
input. Idiomatic OCaml uses function very often. We will use it
throughout the rest of this module.
The one limitation: function only matches on a single
argument, the function's input. For a two-argument function, you
have two options: a regular match on a tuple of the two
arguments, or a let of a match in the body. The function
shorthand cannot do this directly. For one-argument functions,
prefer function.
Patterns appear everywhere, not just in match
The patterns you use in match are not a match-specific
feature. They are a general feature of OCaml that surfaces in
several places.
The left-hand side of a let is also a pattern. So you can
destructure a tuple in a let binding directly: (x, y) = (3, 4)
binds x = 3 and y = 4. We have been using this since
Module 4
without calling it pattern matching, but that is exactly what it is.
Function parameters are also patterns, and any pattern OCaml can write fits there. Four shapes you will see constantly:
The variant case is similar but comes with a caveat:
The compiler warns that the pattern is not exhaustive: the
argument might be None, which the pattern does not cover.
Calling k None raises Match_failure at runtime. So the rule
on let-bindings and function-parameter patterns is: they accept
one pattern and the value had better match it. Tuples and
records are always safe (every value of int * int is a pair,
every value of point has an x and a y); variants with more
than one constructor are not. We come back to this in
Lecture 5.
You can write the destructure separately if you prefer:
The pattern-in-parameter form is just shorter; both desugar to roughly the same code. Use the parameter form when the function expects a structured argument and you want the pieces named right away.
A quick taste of exhaustiveness
Even at this early stage, the compiler is watching for missing cases. Here is a non-exhaustive match:
OCaml emits warning 8 and reports a sample missing input. The
fix is either to add more specific clauses (| 2 -> "two", etc.)
or to add a wildcard catch-all (| _ -> "many"). On finite types
like booleans or small variants, you can usually enumerate every
case; on int, you almost always end with a wildcard.
Lecture 5 covers exhaustiveness in
detail; for now, just know that the warning exists and is helpful.
How match evaluates
The mental model for match v with | p1 -> e1 | p2 -> e2 | ...
is:
- Evaluate
vto a value. - Try to match that value against
p1. - If
p1matches: bind any variables thatp1introduces, then evaluatee1. That is the answer. - If
p1does not match: tryp2. And so on. - If no pattern matches: raise
Match_failureat runtime.
The "match" check itself is structural. A literal pattern 0
matches the value 0. A variable pattern always matches and
records the binding. A wildcard always matches and records
nothing. Patterns we will see in later lectures (tuples,
constructors, lists) match piece by piece.
Crucially, the value v is evaluated exactly once. The
patterns do not re-trigger any side effect in v. So this is
safe:
The Random.int 10 call happens once; the resulting number is
the thing the patterns inspect.
Putting it to work: a small check
A common idiom: convert a "kind" represented as a string into a typed value, with a default for unknown inputs.
The four literal patterns handle the
recognised levels; the wildcard handles everything else. The
shape is repetitive enough that it is tempting to reach for a
hash table or an assoc list, but at five entries the pattern
match is shorter, clearer, and faster than any data structure.
Two checks
What does this evaluate to?
"zero""other"- Warning 11 (unused case)
- Compile error
What does this evaluate to?
"got 0"(with a compiler warning)"zero"- Compile error
Match_failureat runtime
Why: the first clause n is a variable pattern. It matches
anything, including 0, and binds n to the matched value. So
f 0 runs the first clause and produces "got 0". The second
clause 0 -> "zero" is unreachable. OCaml emits warning 11
("this match case is unused"), but the code still compiles and
runs.
A code task:
Write traffic_action : string -> string that returns:
"stop"when the input is"red","slow"when the input is"yellow","go"when the input is"green","unknown signal"for anything else.
Use a function shorthand with literal patterns and a wildcard.
Show reference solution
The shape: four literal patterns followed by a wildcard. This is
the same skeleton as direction_label earlier; what changes is
just the strings.
Common pitfalls
A short list of mistakes that show up every cohort.
Pitfall 1: variable-first, specific-second. As we saw,
putting | x -> ... before | 0 -> ... makes the second clause
dead. The compiler warns; read the warning. The order is
specific-first, general-last.
Pitfall 2: forgetting the leading | is optional. The very
first clause does not need a leading bar; subsequent clauses do.
Mixing the two styles in one match is fine; pick one and be
consistent. Most of the OCaml ecosystem uses a leading bar on
every clause, including the first, for vertical alignment.
Pitfall 3: match is an expression. All clauses must produce
values of the same type. If one clause returns "hello" and
another returns 42, the compiler will reject the whole match.
This is the same expression-typing rule we saw for if/else in
the if-expressions lecture.
Pitfall 4: forgetting the wildcard on int or string. OCaml
will let you write a match on int with only specific cases
(0 -> ..., 1 -> ...), and will warn that you have not covered
2, 3, and so on. Add a wildcard. The compiler is right; the
match is incomplete.
Activity
Predict before reading on.
Show reference solution
The variable pattern x matches everything. The clause that says
"return 99 when the input is 0" never runs because x already
matched the input before 0 got a chance. Putting the specific
case first restores the intended behaviour.
This is the cleanest illustration of why pattern order matters, and why "specific first, general last" is the rule to internalise.
What's next
Lecture 2 pairs the patterns
we have seen with the
recursive types from Module 4.
With one new piece of
notation ([] and h :: t for lists, Leaf and Node for
trees), pattern matching turns into the canonical shape of every
list and tree function: one clause per constructor, recursing on
the structurally smaller sub-value. After that, Lectures 3 to 6
generalise: patterns inside patterns, when guards,
exhaustiveness, and records.
Reading
- Cornell CS3110, Pattern matching: https://cs3110.github.io/textbook/chapters/data/pattern_matching.html
- Real World OCaml, Lists and patterns (the pattern-matching sections): https://dev.realworldocaml.org/lists-and-patterns.html
Sources
This lecture's prose, worked examples, and quizzes are original to
this course. Materials referenced during preparation are listed in
the Reading section above; Cornell CS3110 and Real World OCaml
are CC BY-NC-ND-licensed and have not been derivatively reused.
See LICENSES.md
at the repository root for the full source posture.