Pattern matching on lists and trees
The last lecture covered three primitive pattern forms: literals,
variables, and the wildcard. Those are enough to dispatch on an
integer or a boolean, but they cannot yet take apart a recursive
value. From Module 4 you already
know two recursive types: the built-in 'a list and the binary
tree 'a tree. This lecture is where pattern matching earns its
keep: we pair each recursive type with patterns over its
constructors, and walk the structure with one clause per
constructor. By the end of the lecture we will have written
length, sum, safe hd/tl, append, size, depth, and
mirror, all in three to five lines each.
A list is built from [] and ::
Recall the built-in list type from the recursive-types lecture:
Two constructors, so two patterns. [] matches the empty list.
h :: t matches a non-empty list, binding h to the head element
and t to the rest. That is all the pattern syntax we need for
this lecture; the four functions on the next four slides all use
just these two patterns.
length: count the elements
The first function is length. The shape of the function follows
the shape of the type: one clause per constructor.
The empty list has length 0. A non-empty list has one more
element than its tail, so we add 1 and recurse on the tail.
Notice the _ for the head: we do not need the value, just the
fact that there is a head. The recursive call is on t, which
is strictly shorter than l, so the recursion is guaranteed to
terminate.
sum: total an integer list
sum has the same shape as length, but combines the elements
instead of counting them. Now we do need the head's value.
Base case: the empty list sums to 0. Recursive case: add the
head to the sum of the tail. The structural shape is identical to
length: every list-consuming function will look like this.
Safe head and tail with option
The built-in List.hd and List.tl raise an exception when the
list is empty. Exceptions are a Module 7 idea; for now we can
return option instead, matching the slogan from the
recursive-types lecture:
make illegal states unrepresentable. The function returns
None on the empty list and Some _ otherwise.
head_opt ignores the tail with _; tail_opt ignores the
head. The caller now has to deal with the None case, because
the return type forces it.
append: glue two lists together
append takes two lists and returns a single list with the first
followed by the second. The OCaml built-in is the @ operator.
We can write our own with the same two-clause shape.
The recursion is over l1. When l1 is empty, the answer is
just l2. When l1 is h :: t, the result starts with h and
continues with append t l2. Note that the right-hand :: is the
list constructor, not a pattern: we are building a list here,
not taking one apart.
Structural recursion as a discipline
Look back at the four functions. They all have the same skeleton:
match l with
| [] -> <base answer>
| h :: t -> <combine h with recursive call on t>
This is structural recursion: the function's structure mirrors the type's. There is one clause per constructor; the recursive call is on a strictly smaller sub-value (the tail). The recursion is guaranteed to terminate because every non-empty list is one cons shorter than the original.
A binary tree is Leaf | Node (l, v, r)
Recall the binary tree type from Module 4:
Two constructors again, so again two patterns. Leaf matches the
empty tree. Node (l, v, r) matches an internal node, binding the
left subtree, the value, and the right subtree. The same recipe
as lists, with the difference that the recursive constructor
holds two recursive references instead of one.
size: count the values in a tree
The function over the tree has one clause per constructor, just like a list function. The recursive case has two recursive calls, one per subtree.
The empty tree has size 0. A non-empty tree contributes 1 for
its own value plus the sizes of its two subtrees. The value at
the node does not affect the count, so we discard it with _.
depth: longest path from root to a leaf
depth measures the height of the tree. The recursive case
takes the deeper of the two subtrees and adds 1.
Stdlib.max returns the larger of two values. The empty tree has
depth 0; an internal node is one level deeper than its deepest
subtree.
mirror: swap left and right at every node
mirror returns a new tree where every Node's subtrees are
swapped. It is a tree-producing function as well as a
tree-consuming one: each clause builds the answer with the
constructors.
Empty tree mirrors to empty tree. A Node mirrors to a new
Node with the mirrored right subtree on the left and the
mirrored left subtree on the right. The value is unchanged.
Two checks
question: |
What does length [10; 20; 30] evaluate to using the
length we wrote?
options:
- text: "0"
- text: "3" correct: true
- text: "60"
- text: "An exception is raised."
explanation: |
The list has three cons cells before reaching
[]. Each cons adds1; the empty list contributes0. Total:3.
question: |
In Node (l, v, r) -> 1 + size l + size r, what is the role
of l?
options:
- text: "It is the value at the node."
- text: "It is the left subtree, which itself has type
'a tree." correct: true - text: "It is the leftmost leaf of the tree."
- text: "It is a list of values from the left side of the tree."
explanation: |
Nodeis declaredNode of 'a tree * 'a * 'a tree; the first field is the left subtree. The patternNode (l, v, r)bindslto that subtree.
question: |
Write count_leaves : 'a tree -> int that returns the number
of Leaf constructors in the tree. Pattern-match on the two
constructors.
starter: |
type 'a tree =
| Leaf
| Node of 'a tree * 'a * 'a tree
let rec count_leaves t = failwith "TODO" solution: | type 'a tree = | Leaf | Node of 'a tree * 'a * 'a tree
let rec count_leaves t = match t with | Leaf -> 1 | Node (l, _, r) -> count_leaves l + count_leaves r checks:
- call: count_leaves Leaf expect: 1
- call: count_leaves (Node (Leaf, 1, Leaf)) expect: 2
- call: | count_leaves (Node (Node (Leaf, 1, Leaf), 2, Node (Leaf, 3, Node (Leaf, 4, Leaf)))) expect: 5
Activity
Time to try one yourself before we discuss it. The function uses
both subtrees of every Node, and the answer is a list, not a
number: a small step from mirror (tree out) towards the
linearisation walks you will write in the tutorial.
Show reference solution
Activity solution
The structural-recursion recipe applies straight off: one clause per constructor, recursive calls on the strictly smaller subtrees. The recursive clause concatenates three pieces: the left walk, a one-element list with the node value, and the right walk.
On our running example, the result is [1; 2; 3; 4]. The
tree's structure is gone; only the in-order linearisation
remains. @ is the built-in list-append we wrote by hand
earlier in the lecture.
Common pitfalls
A small set of mistakes catch almost everyone in the first week of writing list and tree functions.
What's next
Lecture 3 takes the next
step: patterns inside patterns, like (x, _) :: _ for "a list
whose head is a pair," and or-patterns like 1 | 2 | 3 -> ...
for sharing a right-hand side across several shapes. The list and
tree patterns from this lecture will appear constantly inside
those nested forms.
Reading
- Cornell CS3110, Pattern matching (list patterns and structural recursion).
- Cornell CS3110, Trees.
Sources
This lecture mirrors the Pattern Matching on Lists and
structural-recursion sections of the
CS3100 monsoon 2025 lecture 6.
The 'a tree type, the example value, and the mirror and
inorder examples are adapted from CS3100 monsoon 2025
lecture 5.