Mutable records and arrays

Functional Programming with OCaml

Mutable records and arrays

Module 7 · Lecture 2

KC Sivaramakrishnan
IIT Madras

Beyond the single-cell ref, OCaml has two more mutable building blocks: records with mutable fields and fixed-size arrays. The previous lecture introduced ref and the equational-reasoning tradeoff that comes with mutation. This lecture introduces general mutable record fields; once we have those, we will see that ref is just the one-field special case. Then we spend most of our time on arrays: when they are the right tool, how to allocate and access them, how they compare with the lists we have been using all course, and the loop syntax that lives alongside them.

The decision of when in-place mutation pays for the equational reasoning you give up runs through both halves of the lecture.

This lecture: mutable records and arrays

Mutable record fields

You mark a record field mutable when you want to update it in place after the record has been constructed.

type counter = { mutable n : int; name : string } let c = { n = 0; name = "visits" } let () = c.n <- c.n + 1 let () = c.n <- c.n + 1 let _ = c.n (* = 2 *) let _ = c.name (* = "visits" *)

Only fields marked mutable can be updated: trying to write c.name <- "x" would be a compile-time error. The <- operator is the field-assignment operator. It takes a field-access expression on the left and a new value on the right, and produces unit.

Mutable record fields

You declare a field mutable when you want to update it in place:

type counter = { mutable n : int; name : string } let c = { n = 0; name = "visits" } let () = c.n <- c.n + 1 let () = c.n <- c.n + 1 let _ = c.n (* = 2 *) let _ = c.name (* = "visits" *)

Marking just the fields that change makes design intent visible right at the type. A reader sees mutable n : int and knows the counter's number changes; they see name : string (no mutable) and know the label does not. The constraint is checked by the compiler: any code that tries to assign to name is rejected.

type buffer = { capacity : int; (* fixed at creation *) mutable size : int; (* changes as we push *) mutable contents : string (* changes as we push *) }

In a record like this, the type declaration is part of the documentation. The capacity is set once; the size and contents update as the buffer fills.

Why some fields mutable and others not

type buffer = { capacity : int; (* fixed at creation *) mutable size : int; (* changes as we push *) mutable contents : string (* changes as we push *) }

A ref is just a record with one mutable field

Now that you have seen mutable record fields, the implementation of ref is no longer mysterious. The standard library defines 'a ref as a one-field record with that field marked mutable:

type 'a ref = { mutable contents : 'a }

The operators ! and := are just shorthand for accessing and updating that field:

So ref is not a magic builtin: it is exactly a record with one mutable field, and the mutable-field machinery you just saw is what ref uses under the hood.

For named mutable state, the record form is often more readable: c.n <- c.n + 1 reads like an imperative assignment. The ref form, with ! and :=, is more concise when you have a single cell. Both compile to the same thing.

A ref is just a record with one mutable field

'a ref is literally:

type 'a ref = { mutable contents : 'a }

A worked example: a doubly-linked list

Mutable record fields earn their keep when a single update has to be visible through several pointers. The canonical example is a doubly-linked list: each node points to its successor and its predecessor, so insertion and deletion at the middle of the list takes O(1) once you have a handle on the node.

The shape:

type 'a node = { value : 'a; mutable next : 'a node option; mutable prev : 'a node option; } type 'a dllist = 'a node option ref

Two design choices worth noticing. The next and prev fields are option so a node at either end can record "no neighbour." A list is a ref of the head node (also option, since the list might be empty); the ref lets the head pointer itself change when we prepend a new element. The node type combines both kinds of mutation we have seen: mutable fields inside the node, and a ref for the overall handle.

A doubly-linked list

Each node points to its successor and its predecessor:

type 'a node = { value : 'a; mutable next : 'a node option; mutable prev : 'a node option; } type 'a dllist = 'a node option ref

The empty list is a fresh ref of None. insert_first adds a new node at the front; because the list is doubly linked, the new node has to be hooked up and the old head (if there was one) has to learn that it has a new predecessor.

let create () : 'a dllist = ref None let insert_first (t : 'a dllist) v = let n = { value = v; prev = None; next = !t } in (match !t with | Some old_head -> old_head.prev <- Some n | None -> ()); t := Some n; n

The function mutates two places: old_head.prev (a mutable field on the existing first node) and t itself (the ref holding the head pointer). Both updates are needed; missing either leaves the list in an inconsistent state.

Walking the list is a normal recursion over the option chain, producing side effects on each value:

let iter (t : 'a dllist) f = let rec walk = function | None -> () | Some node -> f node.value; walk node.next in walk !t

create and insert_first

let create () : 'a dllist = ref None let insert_first (t : 'a dllist) v = let n = { value = v; prev = None; next = !t } in (match !t with | Some old_head -> old_head.prev <- Some n | None -> ()); t := Some n; n

iter over the chain

let iter (t : 'a dllist) f = let rec walk = function | None -> () | Some node -> f node.value; walk node.next in walk !t

A short demo, inserting 3 then 2 then 1 (so the list reads 1, 2, 3 head-to-tail):

let l = create () let _ = insert_first l 3 let _ = insert_first l 2 let _ = insert_first l 1 let () = iter l (fun x -> print_int x; print_string " ") let () = print_newline ()

The iter line prints 1 2 3. The same data lives in the chain of nodes; iter walks it forwards from the head handle. We do not keep a tail handle in dllist, so to walk backwards from the tail we would first have to walk forward to find the tail and then step prev. The reason we still bother with prev is that once we have a node in hand (returned by a search, or the current cursor of an iteration) we can splice it out or step sideways in O(1).

Each insert_first line also prints the resulting node. Look at the output for insert_first l 2:

- : int node =
{value = 2; next = Some {value = 3; next = None;
                         prev = Some <cycle>}; prev = None}

The <cycle> is the toplevel's way of saying "this points back somewhere we have already printed." Node 2's next is node 3, and node 3's prev is node 2: a cycle in the value graph. The pretty-printer follows the next link into node 3, then meets the prev link back to node 2, and rather than infinitely recursing it writes <cycle> and stops. Cyclic values are exactly the kind of thing that immutable construction cannot produce (the values do not exist yet to refer to each other); the mutable fields are what let us tie the knot, and <cycle> is the visible signature of a tied knot in the toplevel.

Demo

let l = create () let _ = insert_first l 3 let _ = insert_first l 2 let _ = insert_first l 1 let () = iter l (fun x -> print_int x; print_string " ") let () = print_newline ()

Prints 1 2 3.

This is the smallest example that puts together everything we have seen: a recursive type, a record with mutable fields, an option to mark the ends of the chain, and a ref to let the handle change. It is also a good place to feel why mutation is the right tool here: every linked-list operation that preserves links must update them in place, and threading new lists through every call would be both painful and miss the point.

Arrays

An array is a fixed-size, mutable sequence of values, all of the same type. The literal syntax uses bar-brackets:

let a = [| 10; 20; 30; 40; 50 |] let _ = a.(0) (* = 10 *) let _ = a.(2) (* = 30 *) let () = a.(2) <- 999 let _ = a (* = [|10; 20; 999; 40; 50|] *)

A few syntactic things to note:

Arrays

A fixed-size, mutable sequence:

let a = [| 10; 20; 30; 40; 50 |] let _ = a.(0) (* = 10 *) let _ = a.(2) (* = 30 *) let () = a.(2) <- 999 let _ = a (* = [|10; 20; 999; 40; 50|] *)

Why use array notation a.(i) instead of square brackets a[i] like C, Java, or Python? Square brackets are already taken by list syntax ([1; 2; 3]), and the language designers wanted indexing to look syntactically distinct from list construction. The parenthesised dot form was the result. The compiler treats a.(i) and a.(i) <- v as primitive operations: there is no function call overhead and they compile to direct array loads and stores.

Lists vs arrays

The choice between a list and an array is a recurring question. The two have different cost profiles and different relationships to immutability.

'a list 'a array
Indexed access O(n) O(1)
Cons / prepend O(1) not direct
Append / extend O(n) not direct
Immutable yes no
Length fixed no yes
Sharing tails yes no
Equational reasoning yes no for mutated cells

The standard trade-off is between O(1) random access (arrays) and O(1) prepending plus immutability (lists). If your computation walks the data front to back, building up a result as it goes, a list is usually the more natural shape; we have seen this all through the pattern-matching and higher-order-functions modules. If your computation needs to jump to arbitrary positions, an array is the right tool.

Array vs list: a trade-off table

'a list 'a array
Indexed access O(n) O(1)
Cons / prepend O(1) not direct
Append / extend O(n) not direct
Immutable yes no
Length fixed no yes
Sharing tails yes no
Equational reasoning yes no for mutated cells

Neither structure is good for "dynamically growing with fast indexed access." For that, OCaml 5.2 introduced Dynarray, a resizable array akin to C++'s std::vector or Java's ArrayList; for byte-string building, there is Buffer. We will not use either in this course, but it is good to know what your options are when you outgrow the trade-off above.

Allocating arrays

You rarely write a large array as a literal. The standard library gives you three workhorse constructors.

let _ = Array.make 5 0 (* = [|0; 0; 0; 0; 0|] *) let _ = Array.init 5 (fun i -> i * i) (* = [|0; 1; 4; 9; 16|] *) let _ = Array.of_list [10; 20; 30] (* = [|10; 20; 30|] *)

Array.make n x allocates an array of length n with every element initialised to x. Array.init n f allocates an array of length n where element i is computed by calling f i. Array.of_list converts an existing list into an array.

Building an array

let _ = Array.make 5 0 (* = [|0; 0; 0; 0; 0|] *) let _ = Array.init 5 (fun i -> i * i) (* = [|0; 1; 4; 9; 16|] *) let _ = Array.of_list [10; 20; 30] (* = [|10; 20; 30|] *)

Array.init is the one to remember: it is the array equivalent of writing a list comprehension or a generator. Given a length and a function from index to value, it allocates the array and runs the function on each index. You do not see the function called in a particular order, but in practice it is 0, 1, ..., n-1.

Iterating arrays

The Array module mirrors the higher-order functions we have seen on lists. Array.iter is the side-effecting walk; Array.map returns a new array; there are also fold_left, fold_right, length, to_list, and the obvious shape-shifters.

let a = [|10; 20; 30|] let () = Array.iter (fun x -> print_endline (string_of_int x)) a

This prints 10, 20, 30 on separate lines. Array.iter returns unit; it is for effect, not for value.

For a pure transformation that does not mutate the input, use Array.map:

let a = [|10; 20; 30|] let b = Array.map (fun x -> x * 2) a let _ = b (* = [|20; 40; 60|] *) let _ = a (* = [|10; 20; 30|]: untouched *)

Array.map allocates a new array and leaves the input unchanged. If you want to update the input in place, use Array.iteri and write back explicitly, or use the loop syntax we are about to see.

Iterating arrays: Array.iter

let a = [|10; 20; 30|] let () = Array.iter (fun x -> print_endline (string_of_int x)) a

Prints 10, 20, 30 on separate lines.

Pure transformation: Array.map

let a = [|10; 20; 30|] let b = Array.map (fun x -> x * 2) a let _ = b (* = [|20; 40; 60|] *) let _ = a (* = [|10; 20; 30|]: untouched *)

OCaml's for and while loops

OCaml has imperative loops, and they live in the language precisely to go with arrays and mutation. They are not the only way to write iteration, and (as we have insisted all course) they are not the default; but when you reach for an array, you usually reach for a loop alongside it.

The two forms:

for i = lo to hi do
  body
done

for i = hi downto lo do
  body
done

while condition do
  body
done

for i = lo to hi do body done runs body once for each i from lo to hi inclusive. The variable i is in scope inside the body. for i = hi downto lo do body done is the reverse. while condition do body done runs the body repeatedly while the condition is true. All three are expressions of type unit.

The body must itself be of type unit. A loop whose body returns some other type triggers a warning, the same way a sequence does: the value is being discarded.

A typical use: counting characters

Here is the kind of code where arrays earn their keep. Suppose you want to count how many times each character appears in a string. You make an array of length 256, indexed by character code, mutate it as you scan the string.

let count_chars s = let counts = Array.make 256 0 in String.iter (fun c -> counts.(Char.code c) <- counts.(Char.code c) + 1) s; counts let _ = let c = count_chars "hello" in (c.(Char.code 'l'), c.(Char.code 'o')) (* = (2, 1) *)

Two 'l's, one 'o'. The shape of the algorithm is exactly what an imperative loop in C would do: walk through the input, indexing by the current character into a fixed-size table, incrementing the counter.

A typical use: counting

let count_chars s = let counts = Array.make 256 0 in String.iter (fun c -> counts.(Char.code c) <- counts.(Char.code c) + 1) s; counts let c = count_chars "hello" let _ = c.(Char.code 'l') (* = 2 *) let _ = c.(Char.code 'o') (* = 1 *)

Could you do this without mutation? Yes: walk the string with a fold, building up a 256-tuple or a Map, returning a new structure at each step. It would be much slower and much longer to read. This is exactly the case where mutation is the right tool.

When you do not want mutation

For most everyday list-shaped work, a fold or map is clearer than an array-and-loop.

let sum_lst xs = List.fold_left (+) 0 xs let sum_arr a = let s = ref 0 in Array.iter (fun x -> s := !s + x) a; !s let _ = sum_lst [1;2;3;4;5] (* = 15 *) let _ = sum_arr [|1;2;3;4;5|] (* = 15 *)

The first version is one line and produces no intermediate mutable state. The second version is three lines and needs a ref. (You could also write the second with Array.fold_left (+) 0 a, which is again one line.)

When you don't want mutation

For most everyday list-shaped work, a fold or map is clearer than an array-and-loop:

let sum_lst xs = List.fold_left (+) 0 xs let sum_arr a = let s = ref 0 in Array.iter (fun x -> s := !s + x) a; !s let _ = sum_lst [1;2;3;4;5] (* = 15 *) let _ = sum_arr [|1;2;3;4;5|] (* = 15 *)

The discipline is the same as for ref: reach for arrays when the algorithm wants random-access mutation. If your algorithm is "walk the data and accumulate a result," a fold is clearer.

A quick check

What does the following expression evaluate to?

let a = [|1; 2; 3; 4; 5|] in a.(2) <- 99; a.(0) + a.(2) + a.(4)

Why: the assignment changes a.(2) from 3 to 99. The sum is 1 + 99 + 5 = 105.

What is the type of Array.make 5 0.0?

Why: Array.make has type int -> 'a -> 'a array. The length 5 is the first argument; the initial value 0.0 is the second. The inferred type of 'a is float, so the result is float array.

Activity

Activity

Write reverse_in_place : 'a array -> unit that reverses an array in place. After calling it, the array's contents are reversed.

Write reverse_in_place that reverses an array in place.

let reverse_in_place a = failwith "not implemented"
Show reference solution

Reference solution: the classic two-pointer swap.

let reverse_in_place a = let n = Array.length a in for i = 0 to (n / 2) - 1 do let j = n - 1 - i in let tmp = a.(i) in a.(i) <- a.(j); a.(j) <- tmp done
Show reference solution

Activity solution

let reverse_in_place a = let n = Array.length a in for i = 0 to (n / 2) - 1 do let j = n - 1 - i in let tmp = a.(i) in a.(i) <- a.(j); a.(j) <- tmp done let a = [|1; 2; 3; 4; 5|] let () = reverse_in_place a let _ = a (* = [|5; 4; 3; 2; 1|] *)
  • Returns unit; effect is to mutate a.
  • Two-pointer reverse: swap a.(i) and a.(n-1-i), halfway.
  • for ... to ... do ... done: OCaml's imperative loop.

Destructive vs immutable

A few things worth noticing in the solution. The loop runs to n / 2 - 1 rather than n - 1: each iteration swaps two positions, so we only need to walk halfway through. For an array of odd length, the middle element is its own mirror and stays in place. The function returns unit; its observable effect is the mutation. This is the standard signature pattern for in-place operations: the function returns unit and the caller passes in the structure to be modified.

If you want an immutable reverse instead, the right path is usually Array.of_list (List.rev (Array.to_list a)), or simpler, keep the data as a list in the first place.

When the count isn't known: while

for is the right loop when you know the count of iterations in advance, as in reverse_in_place above. while is the loop you reach for when the iteration stops on a runtime condition: the index where a search succeeds, the iteration where a fixpoint is reached, the point in an input stream where the structure changes. A small worked example, linear search:

let find_index p a = let n = Array.length a in let i = ref 0 in while !i < n && not (p a.(!i)) do incr i done; if !i < n then Some !i else None let _ = find_index (fun x -> x < 0) [|1; 3; -5; 7|] (* = Some 2 *) let _ = find_index (fun x -> x < 0) [|1; 3; 5; 7|] (* = None *)

Two things to notice. First, the loop variable i is a ref, because the body needs to mutate it. (OCaml's for gives you a fresh immutable binding each iteration; while is where refs and loops fit together.) Second, the loop condition uses && for short-circuit: when !i = n, the second clause not (p a.(!i)) is not evaluated, so we never index past the end of the array. Stop conditions like that are exactly what while is for.

The same shape works for any "until found / until stable / until out of input" pattern: a few refs for the state the loop is threading, a while whose condition is the negation of the termination criterion, and a final read of the refs to extract the result.

while: when you stop on a condition, not a count

let find_index p a = let n = Array.length a in let i = ref 0 in while !i < n && not (p a.(!i)) do incr i done; if !i < n then Some !i else None let _ = find_index (fun x -> x < 0) [|1; 3; -5; 7|] (* = Some 2 *)

Closing: default to immutable

The mutation toolkit (refs, mutable fields, arrays) is in the language because some algorithms genuinely want it: random-access table updates, doubly-linked structures, in-place reverses, callbacks that push state into the world. But everything we have built so far, all the way through the higher-order-functions module, did without it. There is a reason to keep that as the default.

Immutable data has three concrete payoffs. First, you do not have to think about aliasing: two names for the same value behave identically whether they share a heap object or not, because the heap object cannot change. Second, the implementation is free to share structure cheaply: a new list x :: xs shares all of xs with the original, no copy needed. Third, immutable data is a perfect fit for concurrency: two threads (or two domains, or two distant cores) can read the same value at once without locks, because there is nothing to race over. We will return to that point when we discuss data races.

The recommendation, then, is the same one OCaml itself follows: use immutable data structures by default, and reach for mutation when performance, expressivity, or interop genuinely requires it. The skill is recognising which of those three things is actually asking, versus which of them just feels like the imperative habit from another language.

Default to immutable; mutate when it pays

What you give up when you mutate:

What you get:

Rule of thumb: immutable until profiled or pinned by the algorithm.

What's next

The next lecture covers exceptions, the third member of the imperative trio (alongside refs and arrays). Exceptions let you signal "something went wrong" without threading an option or result through every layer of code. After that, Lectures 4 and 5 cover streams and laziness and memoization (a pair of techniques that build on refs and exceptions). Lectures 6 through 8 then turn to modules and the way OCaml organizes code at scale.

What's next

Lecture 3: exceptions.

Reading

Sources

This lecture's prose, worked examples, and quizzes are original to this course. Materials referenced during preparation are listed in the Reading section above; Cornell CS3110 and Real World OCaml are CC BY-NC-ND-licensed and have not been derivatively reused. See LICENSES.md at the repository root for the full source posture.