Tuples
The first three modules of this course have been about computation:
literals, bindings, conditionals, functions, recursion. We have
moved a lot of ints and strings and bools through the
language, but always one at a time. Real programs rarely deal with
single values. They deal with aggregates: a 2D point bundles an
x and a y; a key-value entry bundles a key with its value; a
return value from "divide with remainder" is a quotient and a
remainder. Module 4 is about how OCaml lets you build, name, and
take apart such aggregates.
This first lecture covers the simplest aggregate the language
offers: the tuple. A tuple is a fixed-size bundle of values,
possibly of different types, identified by position. The number
of components a tuple carries is called its arity (borrowed
from logic: "arity" is the number of arguments a relation takes).
The pair (3, "hello") is a tuple of arity two; the triple
(1, 2.0, true) is a tuple of arity three. The next two lectures cover
records (aggregates with named fields)
and variants (a different kind of
aggregate: "one of several shapes" rather than "this and that").
Together, tuples, records, and variants are the three building
blocks of every data type you will design in OCaml.
If you have used Python or Go, you have seen tuples before. OCaml tuples are similar in spirit but differ in one important way: their arity is part of their type. A pair and a triple are not just "tuples of different sizes"; they are different types entirely. That is a small detail with large consequences, and we will spend much of this lecture on what it buys you.
A tuple is several values bundled
The syntax is what you would guess from any other language. Put several expressions inside parentheses, separated by commas, and you have a tuple.
The toplevel reports the types as int * bool, int * string * float, and (int * int) * (int * int). The * in a type
position is read as "and": "an int and a bool," "an int
and a string and a float."
The product-type notation int * bool trips up beginners, because
in expression position * is integer multiplication. There is no
ambiguity for the compiler: the * between int and bool is in
a type position, and the * in 3 * 4 is in a value position.
Two different syntactic worlds, sharing one symbol. You will get
used to reading int * bool as "pair of int and bool" quickly
enough.
The mathematical reason for the symbol is that the set of values
of type int * bool is the Cartesian product of the set of ints
and the set of bools. Every pair is one int paired with one
bool; the number of distinct pairs is |int| * |bool|. Hence the
name product type, and hence the *. The
next lecture covers records, which are
also product types (just with named components instead of
positional ones). The
lecture after that covers variants, which
are sum types: the dual notion.
Tuples have a fixed size as part of their type
Here is the property that distinguishes OCaml tuples from Python tuples or JavaScript arrays:
These two values have different types. You cannot pass an int * int * int where an int * int is expected, and vice versa. Each
tuple shape is its own type, distinguished by both the arity and
the types of the components.
You can watch the compiler reject the mismatch in the live cell
below: the annotation says int * int, but the value is a triple.
The error names the mismatch explicitly: the expression has type
int * int * int but an expression of type int * int was
expected. The rejection happens at type-checking time, before any
code runs.
Compare this with Python: (1, 2) and (1, 2, 3) both have type
tuple. A Python function that "takes a 2-tuple" cannot say so in
its signature; it has to check at runtime that the length is 2 and
raise an error otherwise. OCaml's choice is the opposite: a
function that takes an int * int cannot even be called with a
3-tuple. The compiler rejects the call site. The dividend is that
you cannot accidentally hand a "point in 3D" to a function that
expected a "point in 2D"; the type system rules out the bug before
your test suite gets a chance.
The cost is conceptual: each shape is its own type, so a function that "averages a tuple of numbers" cannot be written once and used on tuples of any size. If you genuinely need that, you reach for a list of numbers rather than a tuple. Tuples are for small, fixed groups where the shape is part of the type's identity.
Constructing and extracting
Construction is just the literal you have already seen: write the components inside parentheses, separated by commas.
Extraction is the more interesting part. For pairs, OCaml's standard library gives you two convenience functions:
fst returns the first component of a pair; snd returns the
second. Their types tell the story:
val fst : 'a * 'b -> 'a
val snd : 'a * 'b -> 'b
They are polymorphic: they work on any pair, regardless of the
types of the two components. But notice they are for pairs only.
There is no third function in the standard library. For triples
and larger, OCaml expects you to destructure.
This is the first time we have put anything more structured than a
single name to the left of let =. The thing to the left is called
a pattern. For tuples, the pattern is (x, y) or (x, y, z),
with one name (or _) per component. OCaml matches the right-hand
side against this pattern and binds each name to the corresponding
component. Pattern matching is the subject of all of
Module 5; for this module we use it
informally as it appears.
If the pattern's arity does not match the value's arity, the
compiler complains at compile time, not at runtime. Writing let (x, y) = (1, 2, 3) is a type error: the pattern expects a pair,
the value is a triple, and the two types do not match. You cannot
get into a situation where the code compiles but then crashes
because you destructured the wrong shape.
The underscore _ in a pattern means "match anything here, but do
not bind a name to it." If you want only the first component of a
triple, write:
x is now 1; the other two components are discarded. This is
the standard way to project out a single component of a larger
tuple.
Pattern matching in function arguments
Patterns are not just for let. They can appear in function
parameters too. Here is a function that computes Euclidean distance
between two 2D points represented as pairs:
The two parameters are patterns (x1, y1) and (x2, y2). When
you call distance (0.0, 0.0) (3.0, 4.0), OCaml matches the first
argument against (x1, y1), binding x1 = 0.0 and y1 = 0.0,
and similarly for the second. We will study pattern matching deeply
in Module 5, where you will see that
function parameters are one of several places where any OCaml
pattern can appear.
The inferred type is float * float -> float * float -> float. The
function takes two arguments, each a pair of floats, and returns a
float. We did not write a single type annotation; the body forced
all components to be floats (because of -.), and the parameter
shape forced each argument to be a pair.
A subtle point: the function distance takes two arguments, each
of which happens to be a pair. It is not a function of one
argument that is a 4-tuple. The difference shows up in the type:
float * float -> float * float -> float vs `float * float * float
- float -> float`. We will see in Module 6 that the choice matters when partial application enters the picture.
Argument lists versus tuples: a common confusion
This is worth pausing on, because it is the single largest source of confusion for students arriving in OCaml from C-family languages. Consider these three function definitions:
Both compute 7, but the function signatures are different:
add_curried : int -> int -> intadd_tupled : int * int -> int
add_curried takes two arguments, applied one at a time (this
is the curried form we saw in
the functions-as-values lecture
and studied in the currying lecture). It
can be
partially applied: add_curried 3 is a meaningful value of type
int -> int.
add_tupled takes one argument, which happens to be a pair. It
cannot be "partially applied to the first component"; the whole
pair must be supplied at once.
If you have C or Python reflexes, you may want to write f(x, y)
for every two-argument function. In OCaml, f (x, y) calls a
function that takes a single pair as its argument. The curried
call is f x y, with arguments separated by spaces, no parens.
Mixing these up is the most common syntax mistake of the first
week. The error message you get is usually some variant of "this
expression has type int * int but an expression was expected of
type int," and at first it is mysterious; once you have seen it
twice, the cause is obvious.
The idiomatic rule: use curried arguments (f x y) by default,
because they allow partial application and read more naturally in
the higher-order style we will lean on in
Module 6. Use a tuple argument
only when the two values genuinely belong together as one unit
(a coordinate pair, a key-value entry) and never make sense in
isolation.
Tuples are for heterogeneous data of known shape
A short summary of when to reach for a tuple:
The "rule of thumb" is informal but practical. A 2D point is a
pair: (x, y). Nobody confuses which is the abscissa. A key-value
entry is a pair: (key, value). The convention is universal
enough that the position is self-documenting. But a "person" with
a first_name, last_name, age, phone, email, address is
not a 6-tuple, even though you could technically write it that
way. Code that consumes such a tuple would start asking, "wait,
was the email index 4 or 5?" and the bugs follow.
Records are for that case; we will see
them in the next lecture.
Returning multiple values
OCaml functions always return one value. But that value can be a tuple, which is the standard way to return multiple results.
The function returns a pair (quotient, remainder). The caller
destructures:
Python has return q, r. Go has named return values and return q, r. C does not have anything good (people pass output pointers).
OCaml's mechanism is a tuple return, which is the same answer
Python is implicitly giving (Python's "multiple return values"
build a tuple under the hood). The only difference is that OCaml
makes the tuple explicit, which is exactly the trade we have seen
the language make many times: more visible syntax, less hidden
machinery.
Tuples in collections
You will often see lists of tuples. Each tuple is a "row" in a small table.
The type of pairs is (int * string) list: a list of int * string pairs. Building these tables is something we can do now;
searching them by key needs pattern matching, which is the topic
of Module 5. The standard library
function List.assoc_opt is what you reach for once you have
option and patterns in hand.
A common pitfall: tuples and operator precedence
The comma operator binds looser than most things. When you write a tuple inside a more complex expression, the parentheses are not just for show:
Without parens, f x = x, x + 1 would still parse (commas at the
top of a let body are tuple constructors), so you sometimes see
it. But the moment you have anything around the tuple, you need
the parens.
This is the booby trap. You expect xs to be a list of two
integers. It is not. It is a list of one element, that element
being the pair (1, 2). The type is (int * int) list, not int list. The compiler does not warn you; it is a perfectly valid
list literal, just not the one you meant.
The right separator for lists is ;. The right separator inside
tuples (or between fields in a record) is ,. Confusing the two
gives you valid-looking code that means something different. The
first time you write [1, 2] instead of [1; 2], the compiler
will give you a strange error somewhere downstream (typically when
you try to add the elements: it complains that the elements are
int * int, not int). That stranger error is your hint that the
list literal misparsed.
Pattern matching in let: a small check
What is the type of the function below?
'a -> 'a -> 'a * 'a'a * 'b -> 'b * 'a'a * 'a -> 'a * 'a'a -> 'b -> 'b * 'a
Why: the function takes one argument, a pair, and returns a
pair with the components swapped. The two components can have
different types, so the type variables for input and output are
'a and 'b. The output is a pair (y, x) of types 'b * 'a. So
the whole thing is 'a * 'b -> 'b * 'a. The 'a -> 'a -> ...
options would mean a curried function of two arguments, which is
not what the pattern (x, y) does.
Which of these expressions has type int list?
[1, 2, 3][1; 2; 3](1, 2, 3)[(1, 2, 3)]
Why: lists separate elements with ;. The expression [1, 2, 3] is a one-element list whose element is the triple (1, 2, 3),
so its type is (int * int * int) list. (1, 2, 3) is a triple,
not a list. [(1, 2, 3)] is a one-element list of triples.
A small code task:
Write pair_max : int * int -> int that returns the larger of the
two components.
Show reference solution
Reference solution: let pair_max (x, y) = if x > y then x else y.
Pattern destructure in the argument; one if-expression body.
Activity
Show reference solution
The function works on any triple, regardless of the types of the
three components. That is parametric polymorphism at work: we did
not constrain 'a, 'b, or 'c, so the function can be called on
a triple of any types and will return whatever was in the first
slot.
What's next
We have seen how to bundle a small fixed number of values by position. The next lecture extends this to named fields, which is the right tool for any bundle larger than three components or where the positions do not tell their own story.
Reading
- Cornell CS3110, Records and Tuples: https://cs3110.github.io/textbook/chapters/data/records_tuples.html
- Real World OCaml, Lists and Patterns (tuple section): https://dev.realworldocaml.org/lists-and-patterns.html
Sources
This lecture's prose, worked examples, and quizzes are original to
this course. Materials referenced during preparation are listed in
the Reading section above; Cornell CS3110 and Real World OCaml
are CC BY-NC-ND-licensed and have not been derivatively reused.
See LICENSES.md
at the repository root for the full source posture.