Functions as values
In OCaml, a function is a value, the same way 42 and "hello"
are values. You can name a function with let, pass it as an
argument to another function, return it from a function, and store
it in a data structure. The slogan is that functions are
first-class values: they have the same rights as any other value
in the language. This lecture is about what that actually means in
practice, what new things you can express because of it, and the
syntax for creating functions on the fly.
If you arrived from C or Python, the phrase "functions are values"
may sound right but the practice takes adjusting to. Functions in C
are pointers to code, not really values: you can pass a function
pointer but you cannot create new functions at runtime. Functions
in Python are objects, closer to the OCaml picture, but their
defining-and-using syntax is heavyweight (def name(...) then a
body) compared to OCaml's. Lambda-expressions (added to Java in 8,
to C++ in 11)
are how mainstream languages have caught up to the OCaml-style
treatment of functions; you may already know them. In OCaml this
treatment is the default, not a bolt-on.
Module 3 is built around this idea. This first lecture establishes that functions are values; the next four lectures (recursion, currying, tail recursion, local and mutual recursion) put the machinery to work. By the end of the module you will be writing functions that produce functions, returning functions from functions, and reasoning about function types without effort.
Two ways to define a function
You already know one syntax for defining a function:
let name args = body is the everyday form, used in every example
so far. There is a second form, more verbose but more revealing:
The expression fun x -> x + x is an anonymous function. In other
language traditions this is called a lambda (after Alonzo
Church's lambda calculus, the mathematical foundation of functional
programming). The fun keyword introduces a parameter list (one
parameter, x, in this case); the -> separates the parameters
from the body; the body is the expression that defines what the
function does.
fun x -> x + x is, all on its own, a value. It has a type
(int -> int). It can be bound to a name with let. It can be
passed to another function. It can be returned from a function. It
can sit in a list. If you accept that 42 is a value with type
int, you should accept that fun x -> x + x is a value with type
int -> int. They participate in the language at exactly the same
level.
The shorthand let double x = x + x is syntactic sugar for
let double = fun x -> x + x. The compiler treats them as
identical. Use the shorthand for named functions; use the explicit
fun form for functions you do not want to name (one-off computations
passed to a higher-order function, for instance).
Anonymous functions in expressions
fun x -> e is an expression. It evaluates to a function value.
The toplevel reports int -> int = <fun> and binds the result to
_. The function exists; we just have not named it. We could
apply it on the spot:
The parentheses are necessary because function application binds
tighter than the fun syntax: without them, OCaml would parse
fun x -> x + 1 7 as fun x -> (x + 1 7), which is nonsense.
In practice you rarely apply an anonymous function on the spot (why not just write the value?), but you constantly pass anonymous functions as arguments to other functions, which we will see in a moment.
Multiple parameters
What about a function of two or more parameters? Three ways to write the same function:
All three define the same function. The third form makes something
striking explicit: a "two-argument function" in OCaml is really a
one-argument function that returns another one-argument function.
The fun x -> fun y -> x + y reads "given x, return the function
that, given y, returns x + y." This is called currying (after
Haskell Curry, the same person Haskell is named for); we devote
Lecture 3 of this module to it.
For now, the takeaway: fun x y -> ... and fun x -> fun y -> ...
are interchangeable. The compiler treats them identically. The
shorthand let add x y = ... desugars to the curried form.
Functions are values you can name
Once we accept that functions are values, three things follow: you can name them, you can pass them around, and you can return them. Let's see each.
Three let bindings. The shape is identical to let pi = 3.14,
or let greeting = "hello": a name bound to a value. The only
difference is that the value happens to be a function. The
language does not distinguish.
A small corner of this idea worth naming: a function whose
parameter is the unit value () is called a thunk. Its type is
unit -> 'a for some 'a. The body runs not when the thunk is
defined, but each time you call it with ():
greet is the thunk: a value of type unit -> string. The first
let _ = greet does not invoke the body; it simply names the
function value. The second, greet (), applies the thunk and
produces "hello". Each application re-runs the body. The thunk
is the canonical way to bundle a computation as a value and
decide later when to perform it; you will see thunks again in
the streams-and-laziness lecture,
where a stream's tail is exactly a unit -> 'a stream thunk.
Functions can be returned from other functions
The next step: a function whose return value is another function.
make_adder has type int -> (int -> int): it takes an int and
returns a function from int to int. The two calls produce two
different functions: plus_five always adds 5 (regardless of
what other code does); plus_ten always adds 10. Each call to
make_adder produces a fresh function value.
This is our first higher-order function: a function that takes or returns other functions. We will devote all of Module 6 to higher-order functions; for this lecture, just notice that the machinery exists and works.
A function value remembers its environment
Here is the subtle and important property. When plus_five 3
runs, the body is x + n. The parameter x is bound to 3. But
where does n come from? It is not a parameter of plus_five. It
was a parameter of make_adder, and make_adder has long since
returned.
The answer is that the function value plus_five does not just
hold "the function body"; it also holds a record of "what n was
when this function was created." The value n = 5 was captured
by the function at the moment of its creation. Such a function-value-with-captured-environment
is called a closure. We saw closures briefly in
the let-bindings lecture
(the function captures the value, not the name).
plus_five 100 returns 105. The captured n = 5 is used in
the body. The closure ensures that plus_five keeps working even
after make_adder has returned and its activation frame is gone.
What is a closure?
Here is the definition in precise form. A name that appears in a
function's body but is not bound by the function's parameters or
by any let inside the body is called a free variable of the
function. When the function is created at runtime, OCaml records
the current value of each free variable into the function value
itself; this recording step is called capture, and the table
of captured values is called the environment of the closure. A
closure, then, is a function value paired with its environment.
The point of the environment is that the captured values can be read long after the surrounding scope that originally bound them has gone away. The function value is self-contained: body plus environment.
One free variable:
The inner fun x -> x + n has parameter x and one free
variable n. Each call make_adder k returns a closure whose
environment contains n = k.
Two free variables:
The inner fun x -> ... has parameter x and two free
variables, lo and hi. between 0 10 returns a closure whose
environment contains lo = 0 and hi = 10; that closure then
takes one int and returns a bool.
A function with no free variables (fun x -> x + 1) is still a
closure, just with an empty environment.
Closures capture values, not names
The capture is of values at the time of creation, not of names that get looked up later.
What does f 1 return?
Returns 11. The closure captured n = 10 when f was defined;
the later let n = 99 shadows the outer n but does not change
the value f saw. This is the same point from
the let-bindings lecture
(shadowing is not mutation), applied to function closures.
OCaml uses static scoping with value capture: the binding that
was in scope when f was defined is the one f uses, forever. In
a language where names were re-looked-up at call time, or where the
closure held a reference to the variable rather than its value,
f 1 could see a later assignment. OCaml does neither.
Functions can be passed as arguments
The third thing you can do with a value: pass it as an argument.
apply_twice takes a function f and a value x, and computes
f (f x). The first call passes the anonymous function fun x -> x + 1 (so we get ((5 + 1) + 1) = 7); the second passes double
(so we get (5 * 2) * 2 = 20).
The toplevel reports val apply_twice : ('a -> 'a) -> 'a -> 'a = <fun>. Parsed: takes a function from 'a to 'a, plus an 'a,
and returns an 'a. The 'a is a type variable: it stands for
"some type, the same one in each occurrence." A function whose
type contains type variables is called polymorphic: it works at
every choice of 'a, with no special-casing per type. The same
apply_twice works for int -> int functions, float -> float
functions, string -> string functions, etc.
This is the first time we are seeing a type variable; we will
return to polymorphism formally in
the recursive-types lecture,
once we have parameterised variants (lists of anything) to motivate
it. For now, read 'a -> 'a as "any type to itself."
Anonymous functions are everywhere
You will write fun x -> ... a lot. It is the standard way to pass
a small one-off function to a higher-order utility like List.map.
List.map is the standard library function that applies a function
to every element of a list. We will see List.map
and its friends in detail in Module 6. For now, notice that the
natural way to write "the function x -> x * x" right at the call
site is the anonymous-function form fun x -> x * x.
Without anonymous functions, you would have to invent a name for this little computation:
Both are fine. The anonymous-function form is shorter and keeps the logic close to where it is used; the named form is reusable elsewhere. Idiomatic OCaml leans toward anonymous functions for small, single-use functions.
Reading function types: -> is right-associative
When you write int -> int -> int, OCaml reads this as int -> (int -> int). The arrow is right-associative.
val add : int -> int -> int
val apply_twice : ('a -> 'a) -> 'a -> 'a
int -> int -> int parses as int -> (int -> int): a function
that takes an int and returns a (int -> int) (which is itself
a function from int to int). The right-associative reading
matches currying: add 3 is meaningful (an int -> int function),
because add is really a one-argument function returning a function.
('a -> 'a) -> 'a -> 'a parses as ('a -> 'a) -> ('a -> 'a). The
parentheses around the first argument are necessary: without
them, the type would be 'a -> 'a -> 'a -> 'a, meaning "takes
three 'as and returns one." With them, "takes a function from
'a to 'a, then an 'a, and returns an 'a."
You will get fluent with practice. The trick: read left to right, inserting "and given an X, produces" between each arrow.
A quick check
What is the type of fun s -> s ^ "!"?
string -> stringchar -> string'a -> string'a -> 'a
Why: the literal "!" is a string, and the concatenation
operator ^ works on strings only. It forces its left operand
(the parameter s) to be string, and the result of the
concatenation is string. So the anonymous function is
string -> string. The operator pins both ends; nothing is left
polymorphic.
What does apply_twice (fun x -> x + 1) 5 evaluate to?
56710
Why: apply_twice f x = f (f x). With f = fun x -> x + 1
and x = 5: the inner f 5 is 6; the outer f 6 is 7.
A code challenge:
Define compose : ('b -> 'c) -> ('a -> 'b) -> ('a -> 'c) that
takes two functions g and f and returns the composed function
fun x -> g (f x).
Show reference solution
Reference solution: let compose g f = fun x -> g (f x), or
equivalently let compose g f x = g (f x). Function composition is
one of the most useful tools in functional programming; we will
return to it in Module 6.
Activity
The trickier one in the slide is the anonymous version of
apply_twice. The inferred type ('a -> 'a) -> 'a -> 'a is forced
by the body: f (f x) requires that the output of f be a valid
input to f, so f's output and input types must match (both
'a). The starting x must also be 'a (since it is fed to f).
The whole expression's result is f's output type, which is 'a.
What's next
We have established that functions are values. The next lecture,
M03-L02, covers recursion: functions
that call themselves. This is how OCaml expresses iteration; you
do not write for loops in OCaml (well, you can, but you rarely
will). After recursion, M03-L03 covers
currying in full,
M03-L04 covers tail recursion and
the accumulator pattern (the technique for making recursion fast),
and M03-L05 covers local and
mutual recursion. M03-L06 is the tutorial.
Reading
- Cornell CS3110, Functions: the corresponding chapter: https://cs3110.github.io/textbook/chapters/basics/functions.html
- Real World OCaml, Variables and functions: same material from a different angle: https://dev.realworldocaml.org/variables-and-functions.html
- John Whitington, OCaml from the Very Beginning, Chapter 3: for a gentler pace.
Sources
This lecture's prose, worked examples, and quizzes are original to
this course. Materials referenced during preparation are listed in
the Reading section above; Cornell CS3110 and Real World OCaml
are CC BY-NC-ND-licensed and have not been derivatively reused.
See LICENSES.md
at the repository root for the full source posture.