Records

Functional Programming with OCaml

Records

Module 4 · Lecture 2

KC Sivaramakrishnan
IIT Madras

A record is a bundle, like a tuple, but with each component identified by name instead of position. The last lecture argued that tuples are the right tool for two or three values whose positions are self-evident: a 2D point, a key-value entry, a quotient-remainder pair. The moment you want more than that, or the positions stop telling their own story, you reach for a record.

A record in OCaml plays the same role that a struct plays in C or a class with only data members plays in Java. The syntax is similar in spirit. The semantics has two differences worth internalising up front: records are immutable by default, and they are structurally compared. Both of those make a record behave more like a value (an int, a string) than like an object.

If you have not built a data-modelling habit before, this is the lecture where it starts. Most of the data structures you will design over the rest of the course are records, variants, or combinations of the two.

This lecture: records

Declaring a record type

Unlike tuples, records require a type declaration before you can construct one. You declare the type, with each field's name and type; then you construct values of that type.

type point = { x : float; y : float } let origin = { x = 0.0; y = 0.0 } let p = { x = 3.0; y = 4.0 }

The declaration type point = { x : float; y : float } introduces a new type named point. It has two fields, x and y, both of type float. Then origin and p are values of type point, built with the record-literal syntax { field = value; field = value }.

Declaring a record type

Records are nominally typed in OCaml. Declare the type first, then construct values.

type point = { x : float; y : float } let origin = { x = 0.0; y = 0.0 } let p = { x = 3.0; y = 4.0 }

Note the small syntactic detail: inside the type declaration, the separator between fields is ; and the punctuation between a name and its type is :. Inside an expression that builds a record, the separator is also ; (not ,!) and the punctuation between a name and its value is =. This mirrors the difference between type syntax and expression syntax we have already seen elsewhere.

The order of fields in an expression literal is irrelevant. { x = 3.0; y = 4.0 } and { y = 4.0; x = 3.0 } are the same value. The type declaration gives the canonical order; the compiler does not care how you list them when constructing.

OCaml's records are nominally typed: two record types with the same fields are not interchangeable. If you declare type a = { x : int } and type b = { x : int }, then an a value cannot be passed where a b is expected, even though they have the same shape. This is the opposite of TypeScript's structural object types, and it is a deliberate choice to make types more meaningful identifiers.

Accessing fields

Two ways to get a value out of a record: dot syntax (like Java or Python), and destructuring (like patterns).

type point = { x : float; y : float } let p = { x = 3.0; y = 4.0 } let _ = p.x (* = 3. *) let _ = p.y (* = 4. *)

p.x and p.y are field-access expressions. The result of p.x is 3.0; the result of p.y is 4.0. The field name is syntactically restricted: you cannot write p.(some_expression). The thing after the dot must be a literal field name, not a computed value.

Accessing fields: dot syntax

type point = { x : float; y : float } let p = { x = 3.0; y = 4.0 } let _ = p.x (* = 3. *) let _ = p.y (* = 4. *)

Accessing fields: destructuring

type point = { x : float; y : float } let p = { x = 3.0; y = 4.0 } let { x; y } = p let _ = x (* = 3. *) let _ = y (* = 4. *)

The destructuring form let { x; y } = p works the same way as the tuple destructuring let (x, y) = pair from last lecture. It matches the record against the pattern and binds each name. The shorthand { x; y } desugars to { x = x; y = y }: the field name on the left, the new local name on the right, with the convention that they match when no rename is needed.

If you have used JavaScript ES6 or TypeScript, this is exactly the destructuring-assignment shorthand: const { x, y } = p;. Same idea, slightly different sigil. The shorthand makes code that pulls several fields out of a record cleaner, especially when the names line up with what you want to call the locals.

You can rename while destructuring:

type point = { x : float; y : float } let p = { x = 3.0; y = 4.0 } let { x = ax; y = ay } = p

Now the locals are called ax and ay. This is useful when you are pulling fields from two records in the same scope and need to distinguish them.

Records in function parameters

A function that takes a record can access fields either by dot or by destructuring in the parameter pattern.

type point = { x : float; y : float } let distance p q = let dx = q.x -. p.x in let dy = q.y -. p.y in sqrt (dx *. dx +. dy *. dy) let _ = distance { x = 0.0; y = 0.0 } { x = 3.0; y = 4.0 } (* = 5. *)

The body uses p.x, q.x, etc. The same function, written with destructured parameters:

let distance { x = x1; y = y1 } { x = x2; y = y2 } = let dx = x2 -. x1 in let dy = y2 -. y1 in sqrt (dx *. dx +. dy *. dy)

Records in function parameters: dot syntax

type point = { x : float; y : float } let distance p q = let dx = q.x -. p.x in let dy = q.y -. p.y in sqrt (dx *. dx +. dy *. dy)

Records in function parameters: destructure

type point = { x : float; y : float } let distance { x = x1; y = y1 } { x = x2; y = y2 } = let dx = x2 -. x1 in let dy = y2 -. y1 in sqrt (dx *. dx +. dy *. dy)

There is no functional difference between the two styles. The destructuring form is denser at the top of the function and looser in the body; the dot-syntax form is the reverse. Use whichever makes the function read more clearly. For functions that touch two or three fields of a five-field record, dot syntax is usually cleaner; for functions that use every field, destructuring is nicer.

Functional update

Records are immutable by default. To "modify" a field, you build a new record that differs from the old one in just that field. OCaml gives you a syntactic shortcut for this, called functional update:

type point = { x : float; y : float } let p = { x = 3.0; y = 4.0 } let p2 = { p with y = 10.0 }

The expression { p with y = 10.0 } produces a new record whose fields are the same as p's, except y is 10.0. The original p is unchanged.

Functional update

Records are immutable by default. To get a record that differs from another in one field:

type point = { x : float; y : float } let p = { x = 3.0; y = 4.0 } let p2 = { p with y = 10.0 }

Functional update: confirming p is unchanged

type point = { x : float; y : float } let p = { x = 3.0; y = 4.0 } let p2 = { p with y = 10.0 } let _ = p.y (* = 4. *) let _ = p2.y (* = 10. *)

Functional update is a quiet but important feature. In any program that needs to "modify" a record (a user's profile, a piece of state, a configuration), you write a new version with the changed field and pass that new version forward. The old version is still valid; nothing observable about it has changed.

This buys you the same property we discussed for shadowing in the let-bindings lecture: equational reasoning. The value p is what it is forever. Nothing later in the program can have changed p.y underneath you. Once you have a record, you can reason about it without worrying that some other code path mutated it.

For records with many fields, the with syntax is essential. Writing out a 19-field literal to change one value would be silly; { r with that_one_field = new_value } is exactly what you want.

The mechanism is not free: under the hood, OCaml allocates a new record and copies the unchanged fields. For most records this is imperceptible; for performance-critical inner loops on large records, you may eventually want mutable fields, which we cover at the end of this lecture.

Records vs tuples: when to use which

We have now seen both compound types. Before the practical guidance, a motivating example. Suppose you want to model a rectangle by two corner points. The tuple form is (int * int) * (int * int). Now: which point is which corner? Bottom-left first, top-right second? Or top-left, bottom-right? The type does not say; callers have to guess (or read the doc comment). With a record:

type rectangle = { bottom_left : int * int; top_right : int * int; } let r = { bottom_left = (1, 2); top_right = (5, 6) }

The field names make the convention explicit at the type and at every use site. Two components is on the edge of "tuples are fine"; the moment a position is not self-evident, the record wins even at small arity.

Why named fields: rectangles

A rectangle from two corner points:

type rect_tuple = (int * int) * (int * int) type rectangle = { bottom_left : int * int; top_right : int * int; }
  • Tuple: which corner is first?
  • Record: the field name says.
  • Two components, positions not self-evident.
bottom_left top_right

Records vs tuples: when to use which

Use a record when:

Use a tuple when:

A worked example. Suppose you want to model "a person."

The record version costs you a type declaration but pays you back every time you read or write a field. For anything more than two or three components, this trade is overwhelmingly worth it.

A counter-example: a 2D point. Both options work. (x, y) is fine and brief. { x; y } is more explicit but verbose. Most OCaml code uses records for points too, because once you have a named type point, every function signature that takes or returns one says so unambiguously. A function that accepts float * float is ambiguous: it could be a point, a vector, a rectangle's dimensions, anything.

The deeper habit a record encourages is giving data a named, checkable type rather than leaving it as a bare string or number. Geneticists learned this the expensive way. For years, spreadsheets silently turned gene symbols like SEPT2 and MARCH1 into dates (2-Sep, 1-Mar), corrupting published data sets; a 2016 study found the error in hundreds of papers. The fix, in 2020, was not better discipline but a type change: the gene-naming committee renamed the genes themselves (SEPT2 became SEPTIN2) so the values could no longer be mistaken for something else. A field typed and named for what it holds is the programmer's version of the same defence.

Records compare structurally

Like all values in OCaml, records support the structural equality operator =. Two records are equal if and only if all their corresponding fields are equal.

type point = { x : float; y : float } let p1 = { x = 1.0; y = 2.0 } let p2 = { x = 1.0; y = 2.0 } let _ = p1 = p2 (* = true *)

The expression p1 = p2 is true, because both records have identical field values. This is the same = we have been using for ints and strings throughout. No special "equals" method to define, no equals() override; the compiler handles it.

Records compare structurally

type point = { x : float; y : float } let p1 = { x = 1.0; y = 2.0 } let p2 = { x = 1.0; y = 2.0 } let _ = p1 = p2 (* = true *)

Contrast Java, where == is reference equality (almost always not what you want) and you have to write .equals(), taking care to make it consistent with hashCode(). Contrast C, where struct equality is, on most compilers, simply not defined (or, worse, compares the entire memory block including padding bytes). OCaml's = does the right thing on records and gives you correct equality "for free."

Type inference for records can surprise you

There is one wrinkle that catches people, related to how the compiler infers the type of a record literal. Tuples are straightforward: (3, true) is unambiguously int * bool because of the inferred types of its components. Records are different: the compiler needs to know which record type a literal refers to, and the way it figures that out is by looking at the field names.

type point2 = { x : float; y : float } type point3 = { x : float; y : float; z : float } let p = { x = 1.0; y = 2.0 }

What is the type of p? Both point2 and point3 have an x and a y field. OCaml resolves this by preferring the most recently declared matching type. Here, point3 was declared second, but p only has two fields, so point3 cannot match (a point3 literal needs all three fields). The compiler falls back to point2, the next match, and p gets type point2.

Type inference for records can surprise you

type point2 = { x : float; y : float } type point3 = { x : float; y : float; z : float } let p = { x = 1.0; y = 2.0 }

In practice this rarely surfaces, because most files declare each field name in exactly one record type. When it does come up, add a type annotation: let p : point2 = { x = 1.0; y = 2.0 }, and the ambiguity is resolved.

A consequence to be aware of: if you have two record types in the same scope with overlapping field names, dot-access expressions become ambiguous in the same way. We will see in Module 7 how modules let you put each record type in its own namespace, which sidesteps the problem entirely.

Records are immutable by default. OCaml does allow individual fields to be opted into in-place mutation (with a mutable keyword in the type declaration and a <- assignment operator), but we defer that to Module 7, where references and the rest of the mutation story land together. For Module 4, every record is immutable, and the functional-update form above ({ p with ... }) is how you "change" a field.

A small check

Given:

type rgb = { r : int; g : int; b : int } let red = { r = 255; g = 0; b = 0 }

What does { red with g = 128 } evaluate to?

Why: functional update produces a new record that copies the fields of red and overrides g with 128. red is unchanged. The result has the same r and b as red, plus the new g.

Define a record circle with fields cx : float, cy : float, radius : float, and write a function circle_area : circle -> float that returns the area. Use Float.pi.

type circle = { cx : float; cy : float; radius : float } let circle_area c = failwith "not implemented"
Show reference solution

Reference solution: let circle_area c = Float.pi *. c.radius *. c.radius, or with destructuring let circle_area { radius; _ } = Float.pi *. radius *. radius. Either works; the second ignores the centre fields explicitly.

Activity

Activity

Define a record type book with fields title : string, author : string, year : int. Create one. Define a function book_title that returns the title.

Show reference solution

Activity solution

type book = { title : string; author : string; year : int } let real_world_ocaml = { title = "Real World OCaml"; author = "Minsky, Madhavapeddy, Hickey"; year = 2013 } let book_title b = b.title let _ = book_title real_world_ocaml (* = "Real World OCaml" *)
Show reference solution

Activity solution: destructure in the parameter

More idiomatic: pattern-match the record in the parameter list.

type book = { title : string; author : string; year : int } let book_title { title; _ } = title
  • _ ignores the other fields.
  • Without the _, OCaml warns the pattern is incomplete.

The _ at the end of { title; _ } is important. Without it, the pattern { title } would be incomplete: it would mean "a record with only the title field," and the compiler would warn that the other fields are not mentioned. The trailing _ says explicitly "yes, I know there are other fields; I do not care about them." Use it whenever you destructure a subset of fields.

What's next

What's next

Lecture 3: variants (also called sum types or tagged unions). Records are "this and that". Variants are "this or that". The combination of records and variants is how you model real data in OCaml.

Tuples and records are both product types: a value of one has a piece of each of several types. The next lecture introduces the dual notion, sum types: a value that is one of several alternatives. Once you have both products and sums, you have the full algebra of algebraic data types, and you can model essentially any data shape you encounter.

Reading

Sources

This lecture's prose, worked examples, and quizzes are original to this course. Materials referenced during preparation are listed in the Reading section above; Cornell CS3110 and Real World OCaml are CC BY-NC-ND-licensed and have not been derivatively reused. See LICENSES.md at the repository root for the full source posture.