Pattern matching on lists and trees

Functional Programming with OCaml

Pattern matching on lists and trees

Module 5 · Lecture 2

KC Sivaramakrishnan
IIT Madras

The last lecture covered three primitive pattern forms: literals, variables, and the wildcard. Those are enough to dispatch on an integer or a boolean, but they cannot yet take apart a recursive value. From Module 4 you already know two recursive types: the built-in 'a list and the binary tree 'a tree. This lecture is where pattern matching earns its keep: we pair each recursive type with patterns over its constructors, and walk the structure with one clause per constructor. By the end of the lecture we will have written length, sum, safe hd/tl, append, size, depth, and mirror, all in three to five lines each.

From flat values to recursive structure

A list is built from [] and ::

Recall the built-in list type from the recursive-types lecture:

(* type 'a list = | [] | (::) of 'a * 'a list *)

Two constructors, so two patterns. [] matches the empty list. h :: t matches a non-empty list, binding h to the head element and t to the rest. That is all the pattern syntax we need for this lecture; the four functions on the next four slides all use just these two patterns.

A list is [] | h :: t

(* the two patterns we will use *) | [] -> (* empty list *) | h :: t -> (* head h, tail t *)

length: count the elements

The first function is length. The shape of the function follows the shape of the type: one clause per constructor.

let rec length l = match l with | [] -> 0 | _ :: t -> 1 + length t let _ = length [10; 20; 30] (* = 3 *) let _ = length [] (* = 0 *)

The empty list has length 0. A non-empty list has one more element than its tail, so we add 1 and recurse on the tail. Notice the _ for the head: we do not need the value, just the fact that there is a head. The recursive call is on t, which is strictly shorter than l, so the recursion is guaranteed to terminate.

length: one clause per constructor

let rec length l = match l with | [] -> 0 | _ :: t -> 1 + length t let _ = length [10; 20; 30] (* = 3 *) let _ = length [] (* = 0 *)

sum: total an integer list

sum has the same shape as length, but combines the elements instead of counting them. Now we do need the head's value.

let rec sum l = match l with | [] -> 0 | h :: t -> h + sum t let _ = sum [10; 20; 30] (* = 60 *) let _ = sum [] (* = 0 *)

Base case: the empty list sums to 0. Recursive case: add the head to the sum of the tail. The structural shape is identical to length: every list-consuming function will look like this.

sum: combine the head with the recursive result

let rec sum l = match l with | [] -> 0 | h :: t -> h + sum t let _ = sum [10; 20; 30] (* = 60 *)

Safe head and tail with option

The built-in List.hd and List.tl raise an exception when the list is empty. Exceptions are a Module 7 idea; for now we can return option instead, matching the slogan from the recursive-types lecture: make illegal states unrepresentable. The function returns None on the empty list and Some _ otherwise.

let head_opt l = match l with | [] -> None | h :: _ -> Some h let tail_opt l = match l with | [] -> None | _ :: t -> Some t let _ = head_opt [10; 20; 30] (* = Some 10 *) let _ = tail_opt [10; 20; 30] (* = Some [20; 30] *) let _ = head_opt [] (* = None *) let _ = tail_opt [] (* = None *)

head_opt ignores the tail with _; tail_opt ignores the head. The caller now has to deal with the None case, because the return type forces it.

Safe head and tail with option

let head_opt l = match l with | [] -> None | h :: _ -> Some h let tail_opt l = match l with | [] -> None | _ :: t -> Some t let _ = head_opt [10; 20; 30] (* = Some 10 *) let _ = tail_opt [10; 20; 30] (* = Some [20; 30] *) let _ = head_opt [] (* = None *)

append: glue two lists together

append takes two lists and returns a single list with the first followed by the second. The OCaml built-in is the @ operator. We can write our own with the same two-clause shape.

let rec append l1 l2 = match l1 with | [] -> l2 | h :: t -> h :: append t l2 let _ = append [1; 2; 3] [4; 5; 6] (* = [1; 2; 3; 4; 5; 6] *) let _ = append [] [4; 5; 6] (* = [4; 5; 6] *) let _ = append [1; 2; 3] [] (* = [1; 2; 3] *)

The recursion is over l1. When l1 is empty, the answer is just l2. When l1 is h :: t, the result starts with h and continues with append t l2. Note that the right-hand :: is the list constructor, not a pattern: we are building a list here, not taking one apart.

append: recurse over the first list

let rec append l1 l2 = match l1 with | [] -> l2 | h :: t -> h :: append t l2 let _ = append [1; 2; 3] [4; 5; 6] (* = [1; 2; 3; 4; 5; 6] *)

Structural recursion as a discipline

Look back at the four functions. They all have the same skeleton:

match l with
| []     -> <base answer>
| h :: t -> <combine h with recursive call on t>

This is structural recursion: the function's structure mirrors the type's. There is one clause per constructor; the recursive call is on a strictly smaller sub-value (the tail). The recursion is guaranteed to terminate because every non-empty list is one cons shorter than the original.

The shape of every list function

match l with
| []     -> <base>
| h :: t -> <combine h with the recursive result on t>

A binary tree is Leaf | Node (l, v, r)

Recall the binary tree type from Module 4:

type 'a tree = | Leaf | Node of 'a tree * 'a * 'a tree let example = Node (Node (Leaf, 1, Leaf), 2, Node (Leaf, 3, Node (Leaf, 4, Leaf)))

Two constructors again, so again two patterns. Leaf matches the empty tree. Node (l, v, r) matches an internal node, binding the left subtree, the value, and the right subtree. The same recipe as lists, with the difference that the recursive constructor holds two recursive references instead of one.

A tree is Leaf | Node

type 'a tree = | Leaf | Node of 'a tree * 'a * 'a tree
        2
       / \
      1   3
           \
            4

size: count the values in a tree

The function over the tree has one clause per constructor, just like a list function. The recursive case has two recursive calls, one per subtree.

type 'a tree = | Leaf | Node of 'a tree * 'a * 'a tree let rec size t = match t with | Leaf -> 0 | Node (l, _, r) -> 1 + size l + size r let example = Node (Node (Leaf, 1, Leaf), 2, Node (Leaf, 3, Node (Leaf, 4, Leaf))) let _ = size example (* = 4 *) let _ = size Leaf (* = 0 *)

The empty tree has size 0. A non-empty tree contributes 1 for its own value plus the sizes of its two subtrees. The value at the node does not affect the count, so we discard it with _.

size: one clause per constructor, two subtree calls

let rec size t = match t with | Leaf -> 0 | Node (l, _, r) -> 1 + size l + size r let example = Node (Node (Leaf, 1, Leaf), 2, Node (Leaf, 3, Node (Leaf, 4, Leaf))) let _ = size example (* = 4 *)

depth: longest path from root to a leaf

depth measures the height of the tree. The recursive case takes the deeper of the two subtrees and adds 1.

type 'a tree = | Leaf | Node of 'a tree * 'a * 'a tree let rec depth t = match t with | Leaf -> 0 | Node (l, _, r) -> 1 + max (depth l) (depth r) let example = Node (Node (Leaf, 1, Leaf), 2, Node (Leaf, 3, Node (Leaf, 4, Leaf))) let _ = depth example (* = 3 *) let _ = depth Leaf (* = 0 *)

Stdlib.max returns the larger of two values. The empty tree has depth 0; an internal node is one level deeper than its deepest subtree.

depth: take the deeper subtree, add 1

let rec depth t = match t with | Leaf -> 0 | Node (l, _, r) -> 1 + max (depth l) (depth r) let _ = depth example (* = 3 *)

mirror: swap left and right at every node

mirror returns a new tree where every Node's subtrees are swapped. It is a tree-producing function as well as a tree-consuming one: each clause builds the answer with the constructors.

type 'a tree = | Leaf | Node of 'a tree * 'a * 'a tree let rec mirror t = match t with | Leaf -> Leaf | Node (l, v, r) -> Node (mirror r, v, mirror l) let _ = mirror (Node (Node (Leaf, 1, Leaf), 2, Leaf)) (* = Node (Leaf, 2, Node (Leaf, 1, Leaf)) *) let _ = mirror Leaf (* = Leaf *)

Empty tree mirrors to empty tree. A Node mirrors to a new Node with the mirrored right subtree on the left and the mirrored left subtree on the right. The value is unchanged.

mirror: rebuild with swapped subtrees

let rec mirror t = match t with | Leaf -> Leaf | Node (l, v, r) -> Node (mirror r, v, mirror l) let _ = mirror (Node (Node (Leaf, 1, Leaf), 2, Leaf)) (* = Node (Leaf, 2, Node (Leaf, 1, Leaf)) *)
  • LHS Node (l, v, r): pattern that takes apart a node.
  • RHS Node (mirror r, v, mirror l): constructor that builds a new node.

input:

   2
  /
 1

output:

2
 \
  1

Two checks

question: | What does length [10; 20; 30] evaluate to using the length we wrote?

let rec length l = match l with | [] -> 0 | _ :: t -> 1 + length t

options:

question: | In Node (l, v, r) -> 1 + size l + size r, what is the role of l? options:

question: | Write count_leaves : 'a tree -> int that returns the number of Leaf constructors in the tree. Pattern-match on the two constructors. starter: | type 'a tree = | Leaf | Node of 'a tree * 'a * 'a tree

let rec count_leaves t = failwith "TODO" solution: | type 'a tree = | Leaf | Node of 'a tree * 'a * 'a tree

let rec count_leaves t = match t with | Leaf -> 1 | Node (l, _, r) -> count_leaves l + count_leaves r checks:

Activity

Time to try one yourself before we discuss it. The function uses both subtrees of every Node, and the answer is a list, not a number: a small step from mirror (tree out) towards the linearisation walks you will write in the tutorial.

Activity

type 'a tree = | Leaf | Node of 'a tree * 'a * 'a tree let rec inorder t = failwith "TODO"
Show reference solution

Activity solution

The structural-recursion recipe applies straight off: one clause per constructor, recursive calls on the strictly smaller subtrees. The recursive clause concatenates three pieces: the left walk, a one-element list with the node value, and the right walk.

type 'a tree = | Leaf | Node of 'a tree * 'a * 'a tree let rec inorder t = match t with | Leaf -> [] | Node (l, v, r) -> inorder l @ [v] @ inorder r let example = Node (Node (Leaf, 1, Leaf), 2, Node (Leaf, 3, Node (Leaf, 4, Leaf))) let _ = inorder example (* = [1; 2; 3; 4] *) let _ = inorder Leaf (* = [] *)

On our running example, the result is [1; 2; 3; 4]. The tree's structure is gone; only the in-order linearisation remains. @ is the built-in list-append we wrote by hand earlier in the lecture.

Activity solution: inorder

let rec inorder t = match t with | Leaf -> [] | Node (l, v, r) -> inorder l @ [v] @ inorder r let example = Node (Node (Leaf, 1, Leaf), 2, Node (Leaf, 3, Node (Leaf, 4, Leaf))) let _ = inorder example (* = [1; 2; 3; 4] *)
  • Leaf returns the empty list.
  • Node (l, v, r) returns inorder l @ [v] @ inorder r.
  • @ is the built-in list-append (same shape as the append we wrote earlier in this lecture).
  • inorder example = [1; 2; 3; 4].

Common pitfalls

A small set of mistakes catch almost everyone in the first week of writing list and tree functions.

Common pitfalls

What's next

Lecture 3 takes the next step: patterns inside patterns, like (x, _) :: _ for "a list whose head is a pair," and or-patterns like 1 | 2 | 3 -> ... for sharing a right-hand side across several shapes. The list and tree patterns from this lecture will appear constantly inside those nested forms.

What's next

Reading

Sources

This lecture mirrors the Pattern Matching on Lists and structural-recursion sections of the CS3100 monsoon 2025 lecture 6. The 'a tree type, the example value, and the mirror and inorder examples are adapted from CS3100 monsoon 2025 lecture 5.