Module basics

Functional Programming with OCaml

Module basics

Module 7 · Lecture 6

KC Sivaramakrishnan
IIT Madras

So far every piece of OCaml we have written has lived at the top level: a string of lets, types, and let recs, each defined in a shared global namespace. That worked for the kinds of examples this course has covered: a single file, a few dozen definitions, every name visible everywhere. It would stop working the instant we tried to write a real program.

Real programs need structure. They have hundreds of definitions spread across dozens of files. Names collide: every module wants to define length, every module wants to define to_string, every module wants to define map. Implementations have internal helpers that no caller should depend on. Two libraries written by different teams need to coexist without clashing.

OCaml's answer to all of this is the module system. A module is a named collection of definitions (values, types, exceptions, even sub-modules) that you can refer to as a unit. The standard library you have been using all course is a tree of modules: List, String, Array, Option, Map, and so on. A real OCaml project is a tree of your modules using and being used by them. This lecture introduces the syntax. The signatures lecture covers the type-level description of a module that lets you hide internals. The functors lecture covers modules parameterized by other modules. The module tutorial closes the module.

This lecture: module basics

Inline modules

The simplest way to define a module is right at the toplevel, with module Name = struct ... end.

module Greet = struct let hello name = "hello, " ^ name let goodbye name = "goodbye, " ^ name end let _ = Greet.hello "world" (* = "hello, world" *) let _ = Greet.goodbye "world" (* = "goodbye, world" *)

The shape is consistent throughout the language: module Name = struct ... end is to modules what let name = ... is to values. You bind a name (Greet) to a value (the structure between struct and end). Inside the structure, you write top-level definitions exactly as you would in a .ml file. Outside, you refer to those definitions with the dot notation Greet.hello, Greet.goodbye.

Inline modules

module Greet = struct let hello name = "hello, " ^ name let goodbye name = "goodbye, " ^ name end let _ = Greet.hello "world" (* = "hello, world" *) let _ = Greet.goodbye "world" (* = "goodbye, world" *)

A module name must start with an uppercase letter. The convention is CamelCase (or Snake_case with the first letter capitalised, as in Pretty_printer). This is unlike value names, which start with lowercase. The capitalisation distinction is enforced by the syntax: Greet.hello is unambiguous because the compiler knows Greet is a module reference (uppercase) and hello is a value inside it (lowercase).

Every .ml file is a module

Inside a real OCaml project, you do not usually write module Foo = struct ... end to make a module called Foo. You write a file called foo.ml. The compiler automatically wraps the file's contents in a structure and exposes it as the module Foo. Other files reference its contents as Foo.x, Foo.f, and so on.

This is how the standard library is structured: there is a list.ml in the OCaml source tree that exposes the List module, a string.ml for String, an array.ml for Array. Your project will look the same: you have a parser.ml, a pretty_printer.ml, a cache.ml, each becoming Parser, Pretty_printer, Cache to the rest of the program.

Each .ml file is a module

For this lecture, because we are writing examples that run in a single toplevel cell, we use the inline module M = struct ... end form. In your projects, you will write the contents directly in a .ml file and the file system will do the wrapping for you.

Modules contain types too

A module is not just a namespace for values. It can hold types, exceptions, and sub-modules as well.

module Color = struct type t = Red | Green | Blue let to_string = function | Red -> "red" | Green -> "green" | Blue -> "blue" end let c : Color.t = Color.Red let _ = Color.to_string c (* = "red" *)

Notice the dot notation extends uniformly: Color.t is the type defined inside Color, Color.Red is one of its constructors, Color.to_string is the function. From outside, you always go through the module name.

Modules contain types too

module Color = struct type t = Red | Green | Blue let to_string = function | Red -> "red" | Green -> "green" | Blue -> "blue" end let c : Color.t = Color.Red let _ = Color.to_string c (* = "red" *)

There is a strong convention in OCaml for modules that are about one principal type: name that type t. So Color.t, not Color.color; Set.t, not Set.set; Map.t, not Map.map. The thinking is that the module name already says what kind of thing it is, and the type name does not need to repeat it. The result is the cleaner-reading Color.t style you see throughout the standard library: String.t, List.t, Bytes.t.

open brings names into scope

If you find yourself writing Greet.hello and Greet.goodbye repeatedly in a block of code, you can open the module to drop the prefix.

module Greet = struct let hello name = "hello, " ^ name let goodbye name = "goodbye, " ^ name end let _ = let open Greet in hello "alice" ^ "; " ^ goodbye "alice" (* = "hello, alice; goodbye, alice" *)

The let open M in expr form opens M only inside expr. Outside that expression, you still need the Greet. prefix. This is called the local open, and it is the form to reach for: it keeps the scope of the open visible to the reader. Anyone reading the code sees right where the open begins and ends.

open brings names into scope

If you use Greet.hello and Greet.goodbye repeatedly, you can open the module to drop the prefix:

module Greet = struct let hello name = "hello, " ^ name let goodbye name = "goodbye, " ^ name end let _ = let open Greet in hello "alice" ^ "; " ^ goodbye "alice" (* = "hello, alice; goodbye, alice" *)

The other open form is open M at the top level of a file or module, which brings every name from M into scope for the rest of the file. For small modules with little chance of name collision, this is fine. For large modules like List or String, opening globally is risky: too many short names get pulled in, and a reader of the code can no longer tell at a glance whether map means List.map or String.map or something else entirely. The local open avoids that question by limiting the scope.

When not to open

let _ = String.(length "x" + length "yy") (* = 3 *) let _ = List.[1; 2; 3] (* = [1; 2; 3] *)

There is a middle form, M.(expr), which opens M in just the parenthesised expression. It is even shorter than a local open when you have a tight cluster of references:

let _ = List.(map (fun x -> x * 2) [1; 2; 3]) (* = [2; 4; 6] *)

Use whichever feels clearest at the call site.

Hiding internals: the natural limitation

Inside a module you can define helper functions and values that you do not intend to be used from outside.

module Counter = struct let n = ref 0 let next () = incr n; !n let reset () = n := 0 end let _ = Counter.next () (* = 1 *) let _ = Counter.next () (* = 2 *) let _ = !Counter.n (* = 2 *)

Here is the catch: without an interface, every definition in the module is visible to the outside world. The n ref is meant to be private state, used only by next and reset. But callers can read it with !Counter.n and even write to it with Counter.n := -100. The encapsulation is purely conventional.

Hiding internals

module Counter = struct let n = ref 0 let next () = incr n; !n let reset () = n := 0 end let _ = Counter.next () (* = 1 *) let _ = !Counter.n (* = 1; leaks: external code pokes at n *)

This is the major motivation for the next lecture. The fix is to constrain the module with a signature: a type-level description that lists exactly which names escape, with which types. Anything not in the signature is invisible from the outside. We hold off on the details until the signatures lecture.

Modules can nest

Modules can contain sub-modules:

module Geometry = struct module Point = struct type t = { x : float; y : float } let origin = { x = 0.0; y = 0.0 } let make x y = { x; y } end module Vector = struct type t = { dx : float; dy : float } let zero = { dx = 0.0; dy = 0.0 } end end let p = Geometry.Point.make 3.0 4.0 let _ = p.x (* = 3. *)

The dot notation extends to any depth for module paths: Geometry.Point.t, Geometry.Point.make, and so on. The field access on the last line is just p.x, not p.Geometry.Point.x: OCaml already knows p : Geometry.Point.t, so it resolves the label x from that type (type-directed disambiguation). You need the path to reach the module's values, not to read a field off a value whose type is known.

Modules can nest

module Geometry = struct module Point = struct type t = { x : float; y : float } let make x y = { x; y } end module Vector = struct type t = { dx : float; dy : float } end end let p = Geometry.Point.make 3.0 4.0 let _ = p.x (* = 3.0 *)

In real projects, each file is a module

In a real project, each .ml file is automatically a module named after the file: point.ml defines Point, vector.ml defines Vector. These are top-level modules, so you write Point.t and Vector.t, not Geometry.Point.t. The core OCaml module system knows nothing about directories: putting the files in a geometry/ directory does not by itself create a Geometry module wrapping them.

Getting a Geometry.Point namespace at project scale is the job of the build system. Dune, for instance, can wrap a library so its files become submodules of a single module named after the library (a library geometry exposes Geometry.Point, Geometry.Vector). That is a build-time convenience layered on top of the language, not part of the module system itself. The inline module Point = struct ... end nesting above is how you build the tree within a single file.

Modules are not first-class

OCaml modules live at a different level from values. You cannot pass a module as an argument to a regular function. You cannot store a module in a list. You cannot return one from an if. This is what people mean when they say OCaml has two type systems: the value-level type system you have been using all course, and a separate module-level type system on top.

There is an extension called first-class modules that does let you package a module as a value ((module M : T)) and unpack it on the other end ((val v : T)). It is occasionally useful, but not for everyday code. For the standard module toolkit this lecture introduces, modules live at compile time and are used statically. The "function-like" thing that takes a module and returns a module is called a functor, and we cover it in the functors lecture.

Modules are values, sort of

A quick check

Given this module:

module M = struct type t = int let zero = 0 let succ x = x + 1 end

What is the value of M.succ M.zero?

Why: M.zero is 0. M.succ is the function fun x -> x + 1. Applied to 0, it returns 1. The dot notation works for both values and types: M.t is the type, M.zero is the value.

Why does the convention type t (not type color, type stack, etc.) exist?

Why: Color.t is cleaner than Color.color: the module name provides the context, and the short t avoids stutter. The convention is widespread in the standard library (List.t, String.t, Bytes.t) and idiomatic in third-party code.

Activity

Activity

Define a module Stack with a mutable integer stack: push : int -> unit, pop : unit -> int option, peek : unit -> int option.

Define a Stack module holding integer state.

module Stack = struct let push (_ : int) = failwith "not implemented" let pop () : int option = failwith "not implemented" let peek () : int option = failwith "not implemented" end
Show reference solution

Activity solution: the module

module Stack = struct let s = ref [] let push x = s := x :: !s let pop () = match !s with | [] -> None | x :: rest -> s := rest; Some x let peek () = match !s with | [] -> None | x :: _ -> Some x end
  • Private state s, a ref of a list, lives inside the module.
  • push / pop / peek all mutate or read the same cell.

Activity solution: using it

let () = Stack.push 1 let () = Stack.push 2 let () = Stack.push 3 let _ = Stack.peek () (* = Some 3 *) let _ = Stack.pop () (* = Some 3 *) let _ = Stack.pop () (* = Some 2 *)
  • Push 1, 2, 3; peek reads the top (Some 3) without removing.
  • pop returns and removes: Some 3, then Some 2.
  • There's one stack, shared by every caller: the simplest design.
  • For multiple independent stacks: create values, not modules;
    • make the stack a type passed to the operations, or have a make () return a fresh one.

A few things to notice. The state is a ref of an int list, held by the module itself. Every caller of Stack.push mutates the same list; there is exactly one stack in the program. If you wanted multiple independent stacks, you would either (a) make the stack a type with operations that take and return stacks (the functional style we will see in the module tutorial), or (b) provide a constructor Stack.make () that returns a fresh ref each time.

The other thing to notice: the s ref is visible to outside code, just like the n ref in the Counter example earlier. Someone could write Stack.s := [] from outside and break the abstraction. Hiding s is the job of a signature, which is the next lecture.

What's next

The next lecture introduces signatures, the type-level description of a module. A signature lists which names escape and at what type; everything else is hidden. Once you constrain a module by a signature, the internal refs and helper functions become invisible to the outside world, and you can change them later without breaking any caller. This is OCaml's encapsulation story.

What's next

Lecture 7: module signatures.

Reading

Sources

This lecture's prose, worked examples, and quizzes are original to this course. Materials referenced during preparation are listed in the Reading section above; Cornell CS3110 and Real World OCaml are CC BY-NC-ND-licensed and have not been derivatively reused. See LICENSES.md at the repository root for the full source posture.