Module basics
So far every piece of OCaml we have written has lived at the top
level: a string of lets, types, and let recs, each defined
in a shared global namespace. That worked for the kinds of
examples this course has covered: a single file, a few dozen
definitions, every name visible everywhere. It would stop working
the instant we tried to write a real program.
Real programs need structure. They have hundreds of definitions
spread across dozens of files. Names collide: every module wants
to define length, every module wants to define to_string,
every module wants to define map. Implementations have internal
helpers that no caller should depend on. Two libraries written by
different teams need to coexist without clashing.
OCaml's answer to all of this is the module system. A module is
a named collection of definitions (values, types,
exceptions, even sub-modules) that you
can refer to as a unit. The standard library you have been using
all course is a tree of modules:
List, String, Array,
Option, Map, and so on. A
real OCaml project is a tree of your modules using and being used
by them. This lecture introduces the syntax. The
signatures lecture covers the
type-level description of a module that lets you hide internals.
The functors lecture covers modules
parameterized by other modules. The
module tutorial closes the module.
Inline modules
The simplest way to define a module is right at the toplevel,
with module Name = struct ... end.
The shape is consistent throughout the language: module Name = struct ... end is to modules what let name = ... is to values.
You bind a name (Greet) to a value (the structure between
struct and end). Inside the structure, you write top-level
definitions exactly as you would in a .ml file. Outside, you
refer to those definitions with the dot notation Greet.hello,
Greet.goodbye.
A module name must start with an uppercase letter. The convention
is CamelCase (or Snake_case with the first letter capitalised,
as in Pretty_printer). This is unlike value names, which start
with lowercase. The capitalisation distinction is enforced by the
syntax: Greet.hello is unambiguous because the compiler knows
Greet is a module reference (uppercase) and hello is a value
inside it (lowercase).
Every .ml file is a module
Inside a real OCaml project, you do not usually write module Foo = struct ... end to make a module called Foo. You write a file
called foo.ml. The compiler automatically wraps the file's
contents in a structure and exposes it as the module Foo. Other
files reference its contents as Foo.x, Foo.f, and so on.
This is how the standard library is structured: there is a
list.ml in the OCaml source tree that exposes the List module,
a string.ml for String, an array.ml for Array. Your
project will look the same: you have a parser.ml, a
pretty_printer.ml, a cache.ml, each becoming Parser,
Pretty_printer, Cache to the rest of the program.
For this lecture, because we are writing examples that run in a
single toplevel cell, we use the inline module M = struct ... end form. In your projects, you will write the contents directly
in a .ml file and the file system will do the wrapping for you.
Modules contain types too
A module is not just a namespace for values. It can hold types, exceptions, and sub-modules as well.
Notice the dot notation extends
uniformly: Color.t is the type defined inside Color,
Color.Red is one of its constructors, Color.to_string is the
function. From outside, you always go through the module name.
There is a strong convention in OCaml for modules that are about
one principal type: name that type t. So Color.t, not
Color.color; Set.t, not Set.set; Map.t, not Map.map.
The thinking is that the module name already says what kind of
thing it is, and the type name does not need to repeat it. The
result is the cleaner-reading Color.t style you see throughout
the standard library: String.t, List.t, Bytes.t.
open brings names into scope
If you find yourself writing Greet.hello and Greet.goodbye
repeatedly in a block of code, you can open the module to drop
the prefix.
The let open M in expr form opens M only inside expr.
Outside that expression, you still need the Greet. prefix. This
is called the local open, and it is the form to reach for: it
keeps the scope of the open visible to the reader. Anyone reading
the code sees right where the open begins and ends.
The other open form is open M at the top level of a file or
module, which brings every name from M into scope for the rest
of the file. For small modules with little chance of name
collision, this is fine. For large modules like List or
String, opening globally is risky: too many short names get
pulled in, and a reader of the code can no longer tell at a
glance whether map means List.map or String.map or
something else entirely. The local open avoids that question by
limiting the scope.
There is a middle form, M.(expr), which opens M in just the
parenthesised expression. It is even shorter than a local open
when you have a tight cluster of references:
Use whichever feels clearest at the call site.
Hiding internals: the natural limitation
Inside a module you can define helper functions and values that you do not intend to be used from outside.
Here is the catch: without an interface, every definition in
the module is visible to the outside world. The n ref is meant
to be private state, used only by next and reset. But callers
can read it with !Counter.n and even write to it with
Counter.n := -100. The encapsulation is purely conventional.
This is the major motivation for the next lecture. The fix is to constrain the module with a signature: a type-level description that lists exactly which names escape, with which types. Anything not in the signature is invisible from the outside. We hold off on the details until the signatures lecture.
Modules can nest
Modules can contain sub-modules:
The dot notation extends to any depth for
module paths: Geometry.Point.t, Geometry.Point.make, and so
on. The field access on the last line is just p.x, not
p.Geometry.Point.x: OCaml already knows p : Geometry.Point.t,
so it resolves the label x from that type (type-directed
disambiguation). You need the path to reach the module's
values, not to read a field off a value whose type is known.
In a real project, each .ml file is automatically a module
named after the file: point.ml defines Point, vector.ml
defines Vector. These are top-level modules, so you write
Point.t and Vector.t, not Geometry.Point.t. The core OCaml
module system knows nothing about directories: putting the files
in a geometry/ directory does not by itself create a
Geometry module wrapping them.
Getting a Geometry.Point namespace at project scale is the job
of the build system. Dune, for instance, can wrap a library so
its files become submodules of a single module named after the
library (a library geometry exposes Geometry.Point,
Geometry.Vector). That is a build-time convenience layered on
top of the language, not part of the module system itself. The
inline module Point = struct ... end nesting above is how you
build the tree within a single file.
Modules are not first-class
OCaml modules live at a different level from values. You cannot
pass a module as an argument to a regular function. You cannot
store a module in a list. You cannot return one from an if.
This is what people mean when they say OCaml has two type
systems: the value-level type system you have been using all
course, and a separate module-level type system on top.
There is an extension called first-class modules that does let
you package a module as a value ((module M : T)) and unpack it
on the other end ((val v : T)). It is occasionally useful, but
not for everyday code. For the standard module toolkit this
lecture introduces, modules live at compile time and are used
statically. The "function-like" thing that takes a module and
returns a module is called a functor, and we cover it in the
functors lecture.
A quick check
Given this module:
What is the value of M.succ M.zero?
01M.succ- error
Why: M.zero is 0. M.succ is the function fun x -> x + 1. Applied to 0, it returns 1. The dot notation works for
both values and types: M.t is the type, M.zero is the value.
Why does the convention type t (not type color, type stack,
etc.) exist?
- To save typing.
- Because OCaml requires it.
- Because the module name already says what the type is about, so the type name does not need to repeat it.
- To match Haskell.
Why: Color.t is cleaner than Color.color: the module name
provides the context, and the short t avoids stutter. The
convention is widespread in the standard library (List.t,
String.t, Bytes.t) and idiomatic in third-party code.
Activity
Define a Stack module holding integer state.
Show reference solution
A few things to notice. The state is a ref
of an int list, held by the module itself. Every caller of
Stack.push mutates the same list; there is exactly one stack in
the program. If you wanted multiple independent stacks, you would
either (a) make the stack a type with operations that take and
return stacks (the functional style we will see in the
module tutorial), or (b) provide a constructor
Stack.make () that returns a fresh ref each time.
The other thing to notice: the s ref is visible to outside
code, just like the n ref in the Counter example earlier.
Someone could write Stack.s := [] from outside and break the
abstraction. Hiding s is the job of a signature, which is the
next lecture.
What's next
The next lecture introduces signatures,
the type-level description of a module. A signature lists which
names escape and at what type; everything else is hidden. Once you
constrain a module by a signature, the internal refs and helper
functions become invisible to the outside world, and you can change
them later without breaking any caller. This is OCaml's
encapsulation story.
Reading
- Cornell CS3110, Modules: https://cs3110.github.io/textbook/chapters/modules/modules.html
- Real World OCaml, Files, Modules, and Programs: https://dev.realworldocaml.org/files-modules-and-programs.html
Sources
This lecture's prose, worked examples, and quizzes are original to
this course. Materials referenced during preparation are listed in
the Reading section above; Cornell CS3110 and Real World OCaml
are CC BY-NC-ND-licensed and have not been derivatively reused.
See LICENSES.md
at the repository root for the full source posture.