Functions as values

Functional Programming with OCaml

Functions as values, and anonymous functions

Module 3 · Lecture 1

KC Sivaramakrishnan
IIT Madras

In OCaml, a function is a value, the same way 42 and "hello" are values. You can name a function with let, pass it as an argument to another function, return it from a function, and store it in a data structure. The slogan is that functions are first-class values: they have the same rights as any other value in the language. This lecture is about what that actually means in practice, what new things you can express because of it, and the syntax for creating functions on the fly.

Functions are values

If you arrived from C or Python, the phrase "functions are values" may sound right but the practice takes adjusting to. Functions in C are pointers to code, not really values: you can pass a function pointer but you cannot create new functions at runtime. Functions in Python are objects, closer to the OCaml picture, but their defining-and-using syntax is heavyweight (def name(...) then a body) compared to OCaml's. Lambda-expressions (added to Java in 8, to C++ in 11) are how mainstream languages have caught up to the OCaml-style treatment of functions; you may already know them. In OCaml this treatment is the default, not a bolt-on.

Module 3 is built around this idea. This first lecture establishes that functions are values; the next four lectures (recursion, currying, tail recursion, local and mutual recursion) put the machinery to work. By the end of the module you will be writing functions that produce functions, returning functions from functions, and reasoning about function types without effort.

The plan for Module 3

Two ways to define a function

You already know one syntax for defining a function:

let double x = x + x

let name args = body is the everyday form, used in every example so far. There is a second form, more verbose but more revealing:

let double = fun x -> x + x

Two ways to define a function

You already know one:

let double x = x + x

This is syntactic sugar for:

let double = fun x -> x + x

The expression fun x -> x + x is an anonymous function. In other language traditions this is called a lambda (after Alonzo Church's lambda calculus, the mathematical foundation of functional programming). The fun keyword introduces a parameter list (one parameter, x, in this case); the -> separates the parameters from the body; the body is the expression that defines what the function does.

fun x -> x + x is, all on its own, a value. It has a type (int -> int). It can be bound to a name with let. It can be passed to another function. It can be returned from a function. It can sit in a list. If you accept that 42 is a value with type int, you should accept that fun x -> x + x is a value with type int -> int. They participate in the language at exactly the same level.

The shorthand let double x = x + x is syntactic sugar for let double = fun x -> x + x. The compiler treats them as identical. Use the shorthand for named functions; use the explicit fun form for functions you do not want to name (one-off computations passed to a higher-order function, for instance).

Anonymous functions in expressions

fun x -> e is an expression. It evaluates to a function value.

let _ = fun x -> x + 1 (* : int -> int = <fun> *)

The toplevel reports int -> int = <fun> and binds the result to _. The function exists; we just have not named it. We could apply it on the spot:

let _ = (fun x -> x + 1) 7 (* = 8 *)

Anonymous functions

let _ = fun x -> x + 1 (* : int -> int = <fun> *)

Anonymous functions, applied on the spot

let _ = (fun x -> x + 1) 7 (* = 8 *)

The parentheses are necessary because function application binds tighter than the fun syntax: without them, OCaml would parse fun x -> x + 1 7 as fun x -> (x + 1 7), which is nonsense.

In practice you rarely apply an anonymous function on the spot (why not just write the value?), but you constantly pass anonymous functions as arguments to other functions, which we will see in a moment.

Multiple parameters

What about a function of two or more parameters? Three ways to write the same function:

let add x y = x + y let add' = fun x y -> x + y let add'' = fun x -> fun y -> x + y

Multiple parameters

let add x y = x + y let add' = fun x y -> x + y let add'' = fun x -> fun y -> x + y

All three define the same function. The third form makes something striking explicit: a "two-argument function" in OCaml is really a one-argument function that returns another one-argument function. The fun x -> fun y -> x + y reads "given x, return the function that, given y, returns x + y." This is called currying (after Haskell Curry, the same person Haskell is named for); we devote Lecture 3 of this module to it.

For now, the takeaway: fun x y -> ... and fun x -> fun y -> ... are interchangeable. The compiler treats them identically. The shorthand let add x y = ... desugars to the curried form.

Functions are values you can name

Once we accept that functions are values, three things follow: you can name them, you can pass them around, and you can return them. Let's see each.

let plus_one = fun x -> x + 1 let double = fun x -> x * 2 let triple = fun x -> x * 3 let _ = plus_one 5 (* = 6 *) let _ = double 5 (* = 10 *) let _ = triple 5 (* = 15 *)

Functions are values you can name

let plus_one = fun x -> x + 1 let double = fun x -> x * 2 let triple = fun x -> x * 3 let _ = plus_one 5 (* = 6 *) let _ = double 5 (* = 10 *) let _ = triple 5 (* = 15 *)

Three let bindings. The shape is identical to let pi = 3.14, or let greeting = "hello": a name bound to a value. The only difference is that the value happens to be a function. The language does not distinguish.

A small corner of this idea worth naming: a function whose parameter is the unit value () is called a thunk. Its type is unit -> 'a for some 'a. The body runs not when the thunk is defined, but each time you call it with ():

let greet () = "hello" let _ = greet (* : unit -> string = <fun> *) let _ = greet () (* = "hello" *)

greet is the thunk: a value of type unit -> string. The first let _ = greet does not invoke the body; it simply names the function value. The second, greet (), applies the thunk and produces "hello". Each application re-runs the body. The thunk is the canonical way to bundle a computation as a value and decide later when to perform it; you will see thunks again in the streams-and-laziness lecture, where a stream's tail is exactly a unit -> 'a stream thunk.

Functions can be returned from other functions

The next step: a function whose return value is another function.

let make_adder n = fun x -> x + n let plus_five = make_adder 5 let plus_ten = make_adder 10 let _ = plus_five 3 (* = 8 *) let _ = plus_ten 3 (* = 13 *)

Functions can be returned from other functions

let make_adder n = fun x -> x + n let plus_five = make_adder 5 let plus_ten = make_adder 10 let _ = plus_five 3 (* = 8 *) let _ = plus_ten 3 (* = 13 *)

make_adder has type int -> (int -> int): it takes an int and returns a function from int to int. The two calls produce two different functions: plus_five always adds 5 (regardless of what other code does); plus_ten always adds 10. Each call to make_adder produces a fresh function value.

This is our first higher-order function: a function that takes or returns other functions. We will devote all of Module 6 to higher-order functions; for this lecture, just notice that the machinery exists and works.

A function value remembers its environment

Here is the subtle and important property. When plus_five 3 runs, the body is x + n. The parameter x is bound to 3. But where does n come from? It is not a parameter of plus_five. It was a parameter of make_adder, and make_adder has long since returned.

The answer is that the function value plus_five does not just hold "the function body"; it also holds a record of "what n was when this function was created." The value n = 5 was captured by the function at the moment of its creation. Such a function-value-with-captured-environment is called a closure. We saw closures briefly in the let-bindings lecture (the function captures the value, not the name).

A function value remembers its environment

let make_adder n = fun x -> x + n let plus_five = make_adder 5 let _ = plus_five 100 (* = 105 *)

plus_five 100 returns 105. The captured n = 5 is used in the body. The closure ensures that plus_five keeps working even after make_adder has returned and its activation frame is gone.

What is a closure?

Here is the definition in precise form. A name that appears in a function's body but is not bound by the function's parameters or by any let inside the body is called a free variable of the function. When the function is created at runtime, OCaml records the current value of each free variable into the function value itself; this recording step is called capture, and the table of captured values is called the environment of the closure. A closure, then, is a function value paired with its environment.

The point of the environment is that the captured values can be read long after the surrounding scope that originally bound them has gone away. The function value is self-contained: body plus environment.

One free variable:

let make_adder n = fun x -> x + n

The inner fun x -> x + n has parameter x and one free variable n. Each call make_adder k returns a closure whose environment contains n = k.

Two free variables:

let between lo hi = fun x -> lo <= x && x <= hi

The inner fun x -> ... has parameter x and two free variables, lo and hi. between 0 10 returns a closure whose environment contains lo = 0 and hi = 10; that closure then takes one int and returns a bool.

A function with no free variables (fun x -> x + 1) is still a closure, just with an empty environment.

What is a closure?

A function whose body refers to bindings that are in scope but are not parameters of the function.

A function value = its body + its environment.

Closures: one free variable

let make_adder n = fun x -> x + n

Closures: two free variables

let between lo hi = fun x -> lo <= x && x <= hi

Closures capture values, not names

The capture is of values at the time of creation, not of names that get looked up later.

let n = 10 let f = fun x -> x + n let n = 99 let _ = f 1

What does f 1 return?

Closures are not magic

let n = 10 let f = fun x -> x + n let n = 99 let _ = f 1

What does f 1 return? 11.

Returns 11. The closure captured n = 10 when f was defined; the later let n = 99 shadows the outer n but does not change the value f saw. This is the same point from the let-bindings lecture (shadowing is not mutation), applied to function closures.

OCaml uses static scoping with value capture: the binding that was in scope when f was defined is the one f uses, forever. In a language where names were re-looked-up at call time, or where the closure held a reference to the variable rather than its value, f 1 could see a later assignment. OCaml does neither.

Functions can be passed as arguments

The third thing you can do with a value: pass it as an argument.

let apply_twice f x = f (f x) let _ = apply_twice (fun x -> x + 1) 5 (* = 7 *) let _ = apply_twice double 5 (* = 20 *)

Functions can be passed as arguments

let apply_twice f x = f (f x) let _ = apply_twice (fun x -> x + 1) 5 (* = 7 *) let _ = apply_twice double 5 (* = 20 *)

apply_twice takes a function f and a value x, and computes f (f x). The first call passes the anonymous function fun x -> x + 1 (so we get ((5 + 1) + 1) = 7); the second passes double (so we get (5 * 2) * 2 = 20).

The toplevel reports val apply_twice : ('a -> 'a) -> 'a -> 'a = <fun>. Parsed: takes a function from 'a to 'a, plus an 'a, and returns an 'a. The 'a is a type variable: it stands for "some type, the same one in each occurrence." A function whose type contains type variables is called polymorphic: it works at every choice of 'a, with no special-casing per type. The same apply_twice works for int -> int functions, float -> float functions, string -> string functions, etc.

This is the first time we are seeing a type variable; we will return to polymorphism formally in the recursive-types lecture, once we have parameterised variants (lists of anything) to motivate it. For now, read 'a -> 'a as "any type to itself."

Anonymous functions are everywhere

You will write fun x -> ... a lot. It is the standard way to pass a small one-off function to a higher-order utility like List.map.

let nums = [1; 2; 3; 4; 5] let _ = List.map (fun x -> x * x) nums (* = [1; 4; 9; 16; 25] *)

Anonymous functions are everywhere

let nums = [1; 2; 3; 4; 5] let _ = List.map (fun x -> x * x) nums (* = [1; 4; 9; 16; 25] *)

Without anonymous functions:

let square x = x * x let _ = List.map square nums (* = [1; 4; 9; 16; 25] *)

List.map is the standard library function that applies a function to every element of a list. We will see List.map and its friends in detail in Module 6. For now, notice that the natural way to write "the function x -> x * x" right at the call site is the anonymous-function form fun x -> x * x.

Without anonymous functions, you would have to invent a name for this little computation:

let square x = x * x let _ = List.map square nums (* = [1; 4; 9; 16; 25] *)

Both are fine. The anonymous-function form is shorter and keeps the logic close to where it is used; the named form is reusable elsewhere. Idiomatic OCaml leans toward anonymous functions for small, single-use functions.

Reading function types: -> is right-associative

When you write int -> int -> int, OCaml reads this as int -> (int -> int). The arrow is right-associative.

val add : int -> int -> int
val apply_twice : ('a -> 'a) -> 'a -> 'a

Type signatures: -> is right-associative

val add : int -> int -> int
val apply_twice : ('a -> 'a) -> 'a -> 'a
  • Reads as ('a -> 'a) -> ('a -> 'a).
  • First arg is a function; parens make it explicit.

Read left-to-right, inserting "and given an X, produces":

"given an ('a -> 'a), produces (given an 'a, produces an 'a)"

int -> int -> int parses as int -> (int -> int): a function that takes an int and returns a (int -> int) (which is itself a function from int to int). The right-associative reading matches currying: add 3 is meaningful (an int -> int function), because add is really a one-argument function returning a function.

('a -> 'a) -> 'a -> 'a parses as ('a -> 'a) -> ('a -> 'a). The parentheses around the first argument are necessary: without them, the type would be 'a -> 'a -> 'a -> 'a, meaning "takes three 'as and returns one." With them, "takes a function from 'a to 'a, then an 'a, and returns an 'a."

You will get fluent with practice. The trick: read left to right, inserting "and given an X, produces" between each arrow.

A quick check

What is the type of fun s -> s ^ "!"?

Why: the literal "!" is a string, and the concatenation operator ^ works on strings only. It forces its left operand (the parameter s) to be string, and the result of the concatenation is string. So the anonymous function is string -> string. The operator pins both ends; nothing is left polymorphic.

What does apply_twice (fun x -> x + 1) 5 evaluate to?

Why: apply_twice f x = f (f x). With f = fun x -> x + 1 and x = 5: the inner f 5 is 6; the outer f 6 is 7.

A code challenge:

Define compose : ('b -> 'c) -> ('a -> 'b) -> ('a -> 'c) that takes two functions g and f and returns the composed function fun x -> g (f x).

let compose g f = failwith "not implemented"
Show reference solution

Reference solution: let compose g f = fun x -> g (f x), or equivalently let compose g f x = g (f x). Function composition is one of the most useful tools in functional programming; we will return to it in Module 6.

Activity

Activity

What does the toplevel report as the type of:

fun x -> x +. 1.0

Predict before binding it.

Activity discussion

fun x -> x +. 1.0

Trickier:

fun f x -> f (f x)

The trickier one in the slide is the anonymous version of apply_twice. The inferred type ('a -> 'a) -> 'a -> 'a is forced by the body: f (f x) requires that the output of f be a valid input to f, so f's output and input types must match (both 'a). The starting x must also be 'a (since it is fed to f). The whole expression's result is f's output type, which is 'a.

What's next

We have established that functions are values. The next lecture, M03-L02, covers recursion: functions that call themselves. This is how OCaml expresses iteration; you do not write for loops in OCaml (well, you can, but you rarely will). After recursion, M03-L03 covers currying in full, M03-L04 covers tail recursion and the accumulator pattern (the technique for making recursion fast), and M03-L05 covers local and mutual recursion. M03-L06 is the tutorial.

What's next

Lecture 2: recursion.

Reading

Sources

This lecture's prose, worked examples, and quizzes are original to this course. Materials referenced during preparation are listed in the Reading section above; Cornell CS3110 and Real World OCaml are CC BY-NC-ND-licensed and have not been derivatively reused. See LICENSES.md at the repository root for the full source posture.