Nested patterns and or-patterns
The pattern forms we saw in Lecture 1
(literals, variables, wildcards) match a value at exactly one level:
"this is the constant 0", or "this is anything, call it x." Real
OCaml values are usually built out of pieces: a tuple
holds two things, a constructor
wraps a payload, a list
is a head followed by a tail. Patterns mirror that nested structure.
A pattern can contain other patterns inside it, exactly the way a
value contains other values.
This lecture covers two extensions:
- Nested patterns: a pattern inside another pattern.
Some (x, _)is a pattern with a tuple pattern inside a constructor pattern.(0, _) :: _is a pattern with a tuple inside a cons inside another cons-context. - Or-patterns: alternation.
1 | 2 | 3is a pattern that matches1,2, or3. Or-patterns let you give several shapes the same right-hand side.
Both forms compose, and you will use them heavily. Almost every nontrivial pattern match you write uses one or both.
Patterns inside constructors
The most common nested pattern is to look inside a constructor's
payload. We saw variant types in Module 4
and you have seen Some and None and you have seen records
and lists.
Start with a record. Suppose we want to describe where a point
sits relative to the axes. Without any nesting, you would write
the logic as a cascade of ifs on the two fields:
It works, but the shape of the four cases is buried inside boolean tests. A record pattern with nested field-patterns lets us write the same function as a single match:
Each clause describes a shape: "x is zero and y is zero," "x is zero and y is anything," and so on. The four shapes are visible at a glance.
Notice that inside the curly braces, we are not just listing
field names: each field is matched against a pattern. x = 0.0 is the field-match pattern "x must equal 0.0." y = _
is "y can be anything." Both are sub-patterns of the larger
record pattern. When you read the clause { x = 0.0; y = 0.0 },
read it as: this record's x-field matches 0.0, and its
y-field matches 0.0. Both must hold.
Record patterns: name only the fields you need
Often you do not want to match on values at all; you just want
to bind the fields you need and ignore the rest. The
shorthand form drops the = and uses just the field name:
{ name; age; _ } does three things at once: it matches any
user record (the type is inferred from the field names), binds
name and age locally, and the trailing _ says "there are
other fields; I am ignoring them on purpose."
Without the trailing _, OCaml warns:
Warning 9 [missing-record-field-pattern]: the following labels
are not bound in this record pattern: admin
This warning is worth keeping on: when the record gains a new
field later, every existing pattern lights up so you decide
case-by-case whether the new field matters. With _, you opt
out of that warning for this pattern, saying "I have seen the
new field, and I do not need to update this site."
The shorthand { name; age } (no equals signs) is the everyday
form. The longer { name = name; age = age } is equivalent but
verbose. Use the short form unless you want to rename a field
locally.
Renaming a field on the way in
Sometimes you want to bind a field to a different name. The
syntax is { field = local_name }, the same form as the
value-match record but with a variable pattern instead of a
literal:
name = n says "match the name field, and call it n
locally." admin alone uses the shorthand (sugar for admin = admin).
When you explicitly list one or more specific fields, ignoring
the rest is implicit (no _ needed). Some codebases write _
anyway to make the intent more visible. Both styles are common.
Patterns inside a variant constructor
A pattern can also nest inside a variant constructor:
The pattern Circle 1.0 is a constructor pattern (Circle) with
a literal pattern (1.0) for its payload. The match succeeds if
both checks succeed: the constructor is Circle, and its
argument equals 1.0. If you wrote Circle x, the second check
would be "x is anything, bind it to the name x." If you wrote
Circle (_), the second check would be "anything, do not bind."
The point is that a constructor's payload position takes any
pattern; you choose how specific to be.
The same nesting works for tuple constructors:
The pattern Rectangle (1.0, 1.0) looks for the constructor
Rectangle whose payload is the tuple (1.0, 1.0). Two literal
patterns nested inside a tuple pattern inside a constructor
pattern. Three levels.
Inline records inside constructors
Since OCaml 4.03, a constructor's payload can be a record declared inline. This is the cleanest way to make a payload's fields self-documenting, and the pattern destructures the record exactly like any other record pattern:
Click of { x : int; y : int } is an inline record payload. You
do not need to declare a separate type click = ... and use
Click of click; the record is part of the constructor's
definition.
Inside the pattern, Click { x; y } destructures the inline
record exactly like any other record pattern. You can ignore
fields with _, rename fields with field = local_name, or
list only some, all the same way we just saw.
A small caveat: a value of type event cannot have a value of
type "click record" lifted out separately. The inline record
exists only as the payload of Click. If you want to share the
record shape between constructors, define it as a named record
type instead.
Nesting in lists
Lists are where nested patterns earn their keep. A list is built
from [] (empty) and :: (cons), and most list-processing
functions pattern-match on those constructors. Once you start
combining :: with other patterns, the nesting gets interesting
fast.
Here is "the first component of the first pair":
Read the pattern (x, _) :: _ from the outside in:
- The outermost shape is
something :: something_else, a non-empty list. Call the two pieces head and tail. - The head is
(x, _): a tuple whose first component is bound toxand whose second is discarded. - The tail is
_: anything, discarded.
So we have three nested patterns in one clause:
- A cons pattern at the top.
- A tuple pattern in the head position.
- Variable and wildcard patterns inside the tuple.
This is the everyday shape for "look inside the first element." Almost every list-processing function in real OCaml code has at least one clause that looks like this.
You can go deeper. A pattern can extract the second element too:
The pattern a :: b :: _ matches a list of length at least two:
the first element bound to a, the second to b, the rest
discarded. This is just :: cascaded, but read it carefully: it
parses as a :: (b :: _), i.e. a cons whose head is a and
whose tail is itself a cons (head b, tail anything). The
right-associative reading of :: is what makes this work.
A note on reading nested patterns
The trick to reading deeply nested patterns is the same as reading nested JSON or nested HTML: peel from the outside, and name what you see at each layer. With practice you will start to see the shape instinctively. Until then, a useful exercise is to say the pattern out loud:
"(x, _) :: _: a list whose head is a pair, the first component
of the pair bound to x, the rest of the pair and the rest of
the list ignored."
If you can say it cleanly in English, the pattern is doing what you think it does.
Matching a tuple of values: the diagonal idiom
A common shape in real code is "the answer depends on the
combination of two values." The classic case is comparing two
options: we want one branch each for (None, None),
(Some _, None), (None, Some _), and (Some _, Some _).
The obvious way is to nest one match inside another, once per
value:
It works, but the structure is muddled: the four logical cases
are spread across two matches, the parentheses around the
inner match are required, and the indentation grows with each
level. Worse, exhaustiveness checking happens twice (once per
match) instead of on the cross-product.
The cleaner option is to form a tuple of the two values and
match the whole tuple at once. match a, b with is shorthand
for match (a, b) with: the comma forms an implicit tuple, and
each clause matches a pair of patterns:
The four clauses line up visually, the cases read in any order, and the compiler checks exhaustiveness on the 2x2 cross-product in one pass. Reach for the tuple form whenever the answer depends on the combination of two (or more) related values.
Or-patterns: shared right-hand sides
Often you want several patterns to share the same right-hand side. A classic example: testing whether a character is a vowel.
The | between 'a', 'e', 'i', 'o', 'u' is the
or-pattern combinator. It is not the same as the leading | at
the start of each clause. The leading | separates clauses; the
internal | combines sub-patterns. Reading left to right: "match
if the value is 'a', or if it is 'e', or if it is 'i', or
..." If any alternative matches, the whole or-pattern matches and
the right-hand side runs.
Without or-patterns, the same logic needs five clauses:
let is_vowel = function
| 'a' -> true
| 'e' -> true
| 'i' -> true
| 'o' -> true
| 'u' -> true
| _ -> false
The or-pattern version is shorter and reads better. It also groups related cases visually, which makes intent clearer.
Or-patterns work on variants too:
East | West -> true reads "if it is East or West, return
true." The two groups partition the four constructors. The
compiler can see that the four constructors are all covered, so
this is exhaustive: no warning needed.
The binding constraint on or-patterns
There is one rule about or-patterns that, when you violate it, produces a confusing error message. The rule is:
Every alternative of an or-pattern must bind exactly the same set of variables, at the same types.
That is, if the left alternative binds x : int, the right
alternative must also bind x : int. Not y; not x at a
different type. The reason is that the right-hand side of the
clause references those names, and the compiler needs to know
they are bound regardless of which alternative succeeded.
If you write | A x | B y -> x, the compiler complains because
the right-hand side mentions x, but x is only bound in the
left alternative; if the value were B, there would be no x.
The reverse problem appears too: if your right-hand side uses
y, the compiler complains because y is only bound in the
right alternative.
The fix is to use the same name in both alternatives, and to
mean the same thing by it. In A x | B x -> x, both A and B
carry an int, and we name that int x in either case. The
right-hand side x is then unambiguously typed int.
This constraint is what makes or-patterns type-safe: the compiler can guarantee, just from the pattern, that the right-hand side has consistent types for all the names it references.
Combining or-patterns and nesting
Or-patterns can appear anywhere a pattern can appear, including inside other patterns. This lets you write quite elegant clauses:
The pattern (0 | 1) :: _ says: a non-empty list whose head is
0 or 1. The or-pattern (0 | 1) sits in the head position of
the cons. The parentheses are necessary: without them, 0 | 1 :: _ would parse incorrectly. As a rule, parenthesise an or-pattern
whenever it appears nested inside another pattern.
You can also or together more elaborate sub-patterns. Here is "a non-empty list whose first element is either zero or negative":
The or-pattern has four literal alternatives. (For a real "non-positive" check, you would use a guard, which is the topic of Lecture 4; this is just an illustration.)
The as binder: keep a name for the whole
One more pattern form fits naturally in this lecture: as,
which lets you destructure a value and keep a name for the
whole thing.
Suppose we want to return the head of a list paired with the
list's length. Without as, you have to name the parameter and
reach for it on the right:
The clause destructures (x :: _) and also
needs the whole list (xs) on the right. We can get xs because
the parameter has a name. Inside function (no parameter name),
this trick is not available; as is the cleanest way to keep the
name in either style:
Same result. The pattern (x :: _) as xs first
destructures (x is the head); then as xs names the entire
matched list xs. The right-hand side has both names available.
as reads almost like an English aside: "the head is x, and
the whole list is xs."
Or-patterns with as
The most compelling use of as is with or-patterns. An
or-pattern (p1 | p2 | ...) cannot bind a variable in each
disjunct individually, because OCaml requires the variables of
the alternatives to agree. So if you want a name for whichever
alternative happened to match, as is the only way to do it.
as small binds whichever of 0, 1, 2 matched; as round
likewise. Without the as, you would have to split the or-pattern
back into three separate clauses just to name the value.
Putting it together: a small parser shape
A common shape in real code is a multi-clause match where each clause uses nesting and or-patterns:
The or-pattern Plus | Minus | Times groups the three operator
constructors. The complementary or-pattern groups the
non-operators. Notice Int _: that is a constructor pattern
with a wildcard payload. We do not care what integer it carries;
we only care that it is an Int.
The function reads cleanly because the structure of the data is mirrored in the structure of the patterns. This is the goal: let the patterns express what shape you are handling, and the right-hand sides express what to do.
Two checks
What does head_and_second [1; 2; 3] return, given:
Some (1, 2)Some (1, 3)Some (2, 3)None
Why: the pattern a :: b :: _ matches a list with at least
two elements. a is the head (1), b is the next element
(2), and the tail ([3]) is discarded.
The compiler rejects this with an error. Why?
- Or-patterns are not allowed in
function. - The names
xclash. - The two
xs have incompatible types:intinA,stringinB. AandBcannot be grouped.
Why: the or-pattern constraint is that every alternative
must bind the same variables at the same types. A x binds
x : int; B x binds x : string. The compiler cannot give
the right-hand side a consistent type for x, so the pattern
is rejected.
A code task:
A server's reply is either a numeric status code or a timeout.
Write should_retry : reply -> bool that returns true for
Code 502, Code 503, or Timeout, and false for everything
else. Use a single clause whose left side is an or-pattern; two of
the three alternatives are nested literal patterns inside Code.
Show reference solution
The shape: function | Code 502 | Code 503 | Timeout -> true | _ -> false. One or-pattern with three alternatives: the literals
502 and 503 live inside the Code constructor pattern, and
Timeout is a bare constructor. No alternative binds a variable,
so the same-bindings rule is trivially satisfied.
Common pitfalls
Pitfall 1: forgetting parentheses around an or-pattern. When
an or-pattern sits inside another pattern, parenthesise it.
(0 | 1) :: _ is correct; 0 | 1 :: _ is parsed differently
and probably not what you meant.
Pitfall 2: different names in different alternatives. The
or-pattern constraint says every alternative must bind the same
names. | A x | B y will not work if the right-hand side uses
either name. Use the same name in both: | A x | B x.
Pitfall 3: incompatible types in alternatives. Even with the
same name, the bound types must agree. | A x | B x fails if
A carries int and B carries string. The fix in that case
is usually to not use an or-pattern, and handle the two
constructors separately.
Pitfall 4: deep nesting without parentheses. As patterns grow, the parser can struggle. When in doubt, parenthesise. The compiler is happy with extra parentheses, and your reader will be too.
Activity
Try it before reading the solution.
Show reference solution
The or-pattern combines two constructor patterns, each with a
nested literal payload. Circle 1.0 requires the constructor
and the float 1.0; Rectangle (1.0, 1.0) requires the
constructor and the tuple (1.0, 1.0). Either match sends us
to the same right-hand side.
This is the everyday shape for "one of these specific shapes," where each shape needs its own nested check.
What's next
Lecture 4 introduces when-guards:
predicates attached to a pattern that further filter when the clause
fires. Guards let you express conditions that pure patterns cannot,
like "this list has a positive number at the front." They come with
one important caveat: they disable the compiler's exhaustiveness
check for that clause, which we will see why in
Lecture 5.
Reading
- Cornell CS3110, Pattern matching (continued): https://cs3110.github.io/textbook/chapters/data/pattern_matching.html
- Real World OCaml, Lists and patterns (or-patterns section): https://dev.realworldocaml.org/lists-and-patterns.html
Sources
This lecture's prose, worked examples, and quizzes are original to
this course. Materials referenced during preparation are listed in
the Reading section above; Cornell CS3110 and Real World OCaml
are CC BY-NC-ND-licensed and have not been derivatively reused.
See LICENSES.md
at the repository root for the full source posture.