Mutable records and arrays
Beyond the single-cell ref, OCaml has
two more mutable building blocks: records with mutable fields
and fixed-size arrays. The
previous lecture introduced ref and
the equational-reasoning tradeoff that comes with mutation. This
lecture introduces general mutable record fields; once we have
those, we will see that ref is just the one-field special case.
Then we spend most of our time on arrays: when they are the right
tool, how to allocate and access them, how they compare with
the lists we have been using all
course, and the loop syntax that lives alongside them.
The decision of when in-place mutation pays for the equational reasoning you give up runs through both halves of the lecture.
Mutable record fields
You mark a record field mutable when you
want to update it in place after the record has been constructed.
Only fields marked mutable can be updated: trying
to write c.name <- "x" would be a compile-time error. The <-
operator is the field-assignment operator. It takes a field-access
expression on the left and a new value on the right, and produces
unit.
Marking just the fields that change makes design intent visible
right at the type. A reader sees mutable n : int and knows the
counter's number changes; they see name : string (no mutable)
and know the label does not. The constraint is checked by the
compiler: any code that tries to assign to name is rejected.
In a record like this, the type declaration is part of the documentation. The capacity is set once; the size and contents update as the buffer fills.
A ref is just a record with one mutable field
Now that you have seen mutable record fields, the implementation
of ref is no longer mysterious. The standard library defines
'a ref as a one-field record with that field marked mutable:
type 'a ref = { mutable contents : 'a }
The operators ! and := are just shorthand for accessing and
updating that field:
!risr.contents.r := xisr.contents <- x.
So ref is not a magic builtin: it is exactly a record with one
mutable field, and the mutable-field machinery you just saw is
what ref uses under the hood.
For named mutable state, the record form is often more readable:
c.n <- c.n + 1 reads like an imperative assignment. The ref
form, with ! and :=, is more concise when you have a single
cell. Both compile to the same thing.
A worked example: a doubly-linked list
Mutable record fields earn their keep when a single update has to be visible through several pointers. The canonical example is a doubly-linked list: each node points to its successor and its predecessor, so insertion and deletion at the middle of the list takes O(1) once you have a handle on the node.
The shape:
Two design choices worth noticing. The next and prev fields
are option so a
node at either end can record "no neighbour." A list is a ref of
the head node (also option, since the list might be empty); the
ref lets the head pointer itself change when we prepend a new
element. The node type combines both kinds of mutation we have
seen: mutable fields inside the node, and a ref for the
overall handle.
The empty list is a fresh ref of None. insert_first adds a
new node at the front; because the list is doubly linked, the new
node has to be hooked up and the old head (if there was one)
has to learn that it has a new predecessor.
The function mutates two places: old_head.prev (a mutable
field on the existing first node) and t itself (the ref holding
the head pointer). Both updates are needed; missing either leaves
the list in an inconsistent state.
Walking the list is a normal recursion over the option chain,
producing side effects on each value:
A short demo, inserting 3 then 2 then 1 (so the list reads 1, 2, 3 head-to-tail):
The iter line prints 1 2 3. The same data lives in the chain
of nodes; iter walks it forwards from the head handle. We do
not keep a tail handle in dllist, so to walk backwards from
the tail we would first have to walk forward to find the tail
and then step prev. The reason we still bother with prev is
that once we have a node in hand (returned by a search, or the
current cursor of an iteration) we can splice it out or step
sideways in O(1).
Each insert_first line also prints the resulting node. Look at
the output for insert_first l 2:
- : int node =
{value = 2; next = Some {value = 3; next = None;
prev = Some <cycle>}; prev = None}
The <cycle> is the toplevel's way of saying "this points back
somewhere we have already printed." Node 2's next is node 3,
and node 3's prev is node 2: a cycle in the value graph. The
pretty-printer follows the next link into node 3, then meets
the prev link back to node 2, and rather than infinitely
recursing it writes <cycle> and stops. Cyclic values are
exactly the kind of thing that immutable construction cannot
produce (the values do not exist yet to refer to each other);
the mutable fields are what let us tie the knot, and <cycle>
is the visible signature of a tied knot in the toplevel.
This is the smallest example that puts together everything we have
seen: a recursive type, a record
with mutable fields, an option
to mark the ends of the chain, and a ref
to let the handle change. It is also a good place to feel why
mutation is the right tool here: every linked-list operation that
preserves links must update them in place, and threading new lists
through every call would be both painful and miss the point.
Arrays
An array is a fixed-size, mutable sequence of values, all of the same type. The literal syntax uses bar-brackets:
A few syntactic things to note:
- Array literals are
[| e0; e1; ...; en |], with semicolons between elements. The bars distinguish them from lists, which use plain brackets. - Indexing is
a.(i). The parentheses are mandatory; this is not the dot you use for records. - Assignment is
a.(i) <- value, the same<-operator we just saw on records. - Indexing is zero-based.
- Out-of-bounds access raises the standard exception
Invalid_argument(we cover exceptions in the next lecture).
Why use array notation a.(i) instead of square brackets a[i]
like C, Java, or Python? Square brackets are already taken by list
syntax ([1; 2; 3]), and the language designers wanted indexing
to look syntactically distinct from list construction. The
parenthesised dot form was the result. The compiler treats
a.(i) and a.(i) <- v as primitive operations: there is no
function call overhead and they compile to direct array loads and
stores.
Lists vs arrays
The choice between a list and an array is a recurring question. The two have different cost profiles and different relationships to immutability.
'a list |
'a array |
|
|---|---|---|
| Indexed access | O(n) | O(1) |
| Cons / prepend | O(1) | not direct |
| Append / extend | O(n) | not direct |
| Immutable | yes | no |
| Length fixed | no | yes |
| Sharing tails | yes | no |
| Equational reasoning | yes | no for mutated cells |
The standard trade-off is between O(1) random access (arrays) and O(1) prepending plus immutability (lists). If your computation walks the data front to back, building up a result as it goes, a list is usually the more natural shape; we have seen this all through the pattern-matching and higher-order-functions modules. If your computation needs to jump to arbitrary positions, an array is the right tool.
Neither structure is good for "dynamically growing with fast
indexed access." For that, OCaml 5.2 introduced Dynarray, a
resizable array akin to C++'s std::vector or Java's ArrayList;
for byte-string building, there is Buffer. We will not use
either in this course, but it is good to know what your options
are when you outgrow the trade-off above.
Allocating arrays
You rarely write a large array as a literal. The standard library gives you three workhorse constructors.
Array.make n x allocates an array of length n with every
element initialised to x. Array.init n f allocates an array of
length n where element i is computed by calling f i.
Array.of_list converts an existing list into an array.
Array.init is the one to remember: it is the array equivalent
of writing a list comprehension or a generator. Given a length and
a function from index to value, it allocates the array and runs
the function on each index. You do not see the function called
in a particular order, but in practice it is 0, 1, ..., n-1.
Iterating arrays
The Array module mirrors
the higher-order functions we have seen on lists.
Array.iter is the side-effecting walk;
Array.map returns a new array; there are also
fold_left, fold_right, length,
to_list, and the obvious shape-shifters.
This prints 10, 20, 30 on separate lines. Array.iter
returns unit; it is for effect, not for value.
For a pure transformation that does not mutate the input, use
Array.map:
Array.map allocates a new array and leaves the input
unchanged. If you want to update the
input in place, use Array.iteri and write back explicitly, or
use the loop syntax we are about to see.
OCaml's for and while loops
OCaml has imperative loops, and they live in the language precisely to go with arrays and mutation. They are not the only way to write iteration, and (as we have insisted all course) they are not the default; but when you reach for an array, you usually reach for a loop alongside it.
The two forms:
for i = lo to hi do
body
done
for i = hi downto lo do
body
done
while condition do
body
done
for i = lo to hi do body done runs body once for each i from
lo to hi inclusive. The variable i is in scope inside the
body. for i = hi downto lo do body done is the reverse. while condition do body done runs the body repeatedly while the
condition is true. All three are expressions of type unit.
The body must itself be of type unit. A loop whose body returns
some other type triggers a warning, the same way a sequence does:
the value is being discarded.
A typical use: counting characters
Here is the kind of code where arrays earn their keep. Suppose you want to count how many times each character appears in a string. You make an array of length 256, indexed by character code, mutate it as you scan the string.
Two 'l's, one 'o'. The shape of the algorithm is exactly what an imperative loop in C would do: walk through the input, indexing by the current character into a fixed-size table, incrementing the counter.
Could you do this without mutation? Yes: walk the string with a
fold, building up a 256-tuple or a Map, returning a new structure
at each step. It would be much slower and much longer to read.
This is exactly the case where mutation is the right tool.
When you do not want mutation
For most everyday list-shaped work, a fold or map is clearer than an array-and-loop.
The first version is one line and produces no
intermediate mutable state. The second version is three lines and
needs a ref. (You could also write the second with
Array.fold_left (+) 0 a, which is again one line.)
The discipline is the same as for ref: reach for arrays when
the algorithm wants random-access mutation. If your algorithm is
"walk the data and accumulate a result," a fold is clearer.
A quick check
What does the following expression evaluate to?
9105108Invalid_argument
Why: the assignment changes a.(2) from 3 to 99. The
sum is 1 + 99 + 5 = 105.
What is the type of Array.make 5 0.0?
int arrayfloat arrayint * floatfloat
Why: Array.make has type int -> 'a -> 'a array. The length
5 is the first argument; the initial value 0.0 is the second.
The inferred type of 'a is float, so the result is
float array.
Activity
Write reverse_in_place that reverses an array in place.
Show reference solution
Reference solution: the classic two-pointer swap.
Show reference solution
A few things worth noticing in the solution. The loop runs to
n / 2 - 1 rather than n - 1: each iteration swaps two
positions, so we only need to walk halfway through. For an array
of odd length, the middle element is its own mirror and stays in
place. The function returns unit; its observable effect is the
mutation. This is the standard signature pattern for in-place
operations: the function returns unit and the caller passes in
the structure to be modified.
If you want an immutable reverse instead, the right path is
usually Array.of_list (List.rev (Array.to_list a)), or simpler,
keep the data as a list in the first place.
When the count isn't known: while
for is the right loop when you know the count of iterations in
advance, as in reverse_in_place above. while is the loop you
reach for when the iteration stops on a runtime condition: the
index where a search succeeds, the iteration where a fixpoint is
reached, the point in an input stream where the structure
changes. A small worked example, linear search:
Two things to notice. First, the loop variable
i is a ref, because the body needs to mutate it. (OCaml's
for gives you a fresh immutable binding each iteration; while
is where refs and loops fit together.) Second, the loop condition
uses && for short-circuit: when !i = n, the second clause
not (p a.(!i)) is not evaluated, so we never index past the
end of the array. Stop conditions like that are exactly what
while is for.
The same shape works for any "until found / until stable / until
out of input" pattern: a few refs for the state the loop is
threading, a while whose condition is the negation of the
termination criterion, and a final read of the refs to extract
the result.
Closing: default to immutable
The mutation toolkit (refs, mutable fields, arrays) is in the language because some algorithms genuinely want it: random-access table updates, doubly-linked structures, in-place reverses, callbacks that push state into the world. But everything we have built so far, all the way through the higher-order-functions module, did without it. There is a reason to keep that as the default.
Immutable data has three concrete payoffs. First, you do not have
to think about aliasing:
two names for the same value behave identically whether they share
a heap object or not, because the heap object cannot change.
Second, the implementation is free to share structure cheaply: a
new list x :: xs shares all of xs with the original, no copy
needed. Third, immutable data is a perfect fit for concurrency:
two threads (or two domains, or two distant cores) can read the
same value at once without locks, because there is nothing to
race over. We will return to that point when we discuss
data races.
The recommendation, then, is the same one OCaml itself follows: use immutable data structures by default, and reach for mutation when performance, expressivity, or interop genuinely requires it. The skill is recognising which of those three things is actually asking, versus which of them just feels like the imperative habit from another language.
What's next
The next lecture covers exceptions, the third member of the imperative trio (alongside refs and arrays). Exceptions let you signal "something went wrong" without threading an option or result through every layer of code. After that, Lectures 4 and 5 cover streams and laziness and memoization (a pair of techniques that build on refs and exceptions). Lectures 6 through 8 then turn to modules and the way OCaml organizes code at scale.
Reading
- Cornell CS3110, Arrays: https://cs3110.github.io/textbook/chapters/mut/arrays.html
- Cornell CS3110, Mutable fields: https://cs3110.github.io/textbook/chapters/mut/mutable_fields.html
Sources
This lecture's prose, worked examples, and quizzes are original to
this course. Materials referenced during preparation are listed in
the Reading section above; Cornell CS3110 and Real World OCaml
are CC BY-NC-ND-licensed and have not been derivatively reused.
See LICENSES.md
at the repository root for the full source posture.