Memoization
The previous lecture ended
on a small puzzle. Lazy.t was faster than a plain thunk for
streams because forcing a Lazy.t runs the body once and
caches the result. That trick (compute on first call, save the
answer, return the saved answer on every subsequent call) has a
name: memoization. It is one of the oldest tricks in
functional programming, and it has uses well beyond streams.
This lecture lifts memoization to a general technique. We write
a memo combinator that wraps any function with a cache; we
apply it to a slow expensive function and watch the second call
become free; we tackle the trickier case of memoizing recursive
functions (which needs a small reshuffling of how the recursion
is written); we use the recursive form to make a naive O(2^n)
Fibonacci run in linear time, and a naive edit-distance function
run in polynomial time. We close on the catch that ties this
lecture and the previous one together: memoization only works for
pure functions.
A small helper for timing
To see speedups we need to measure them. A short helper:
time_it takes a thunk, runs it, prints the elapsed time in
milliseconds, and returns the result. Sys.time () returns the
processor time the program has used so far, in seconds (as a
float); the difference between two readings, scaled by 1000.,
is the elapsed milliseconds. Processor time is not the wall
clock, but for a single-threaded pure computation the two track
each other closely, and Sys.time works in the browser cells.
The %! in the format string flushes the output buffer so the
timing line appears immediately.
The slow timing demos in this lecture are not run when the page loads (each costs a second or more); click Run on a timing cell to reproduce the numbers on your own machine.
The memo combinator
A memoized version of f should behave just like f on every
input. On the first call with a given argument it computes the
answer and stashes it in a table; on every subsequent call with
the same argument it returns the stashed answer without
recomputing. The table is a Hashtbl
keyed by the argument, holding the answer as the value.
val memo : ('a -> 'b) -> 'a -> 'b = <fun>. The type signature
is identical to the function's own type: the wrapper hides the
cache and looks like a regular function from the outside.
A few things worth noticing in the implementation:
- The cache is created once, inside the call to
memo, and captured by the returned closure. Every call to the returned function shares the same cache. Hashtbl.find_optreturnsSome yfor a hit andNonefor a miss. We pattern-match on it; the hit returns the cached value, the miss computes-then-caches.- The cache lives behind a
ref-like internal:Hashtbl.addmutates it. Without the imperative toolkit from earlier in this module, we could not write this.
Memoizing an expensive identity
We need a function that is deterministic but slow. The canonical slow-but-pure function is naive recursive Fibonacci, which recomputes overlapping subproblems exponentially many times:
fib 37 does tens of millions of additions: a clearly
measurable fraction of a second in the browser. We use it as the
slow body of an identity-like function that returns its argument
after doing that work:
Two calls with the same argument; both run fib 37; both take
roughly the same time. slow_id does not know it has been asked
the same question twice.
Now wrap it:
The first call to memo_id 10 runs slow_id 10 (and hence
fib 37) and caches the result. The second call hits the cache
and returns instantly. Different inputs (10 vs 20) each get
one slow miss followed by free hits.
Where plain memo actually pays off
Is plain memo (without the recursive machinery we build next)
useful on its own? Yes. Memoizing a one-off call is pointless:
you pay the work once either way. But memo earns its keep
whenever an expensive pure function is called repeatedly with
arguments that repeat: the same lookup inside a loop, the same
sub-expression evaluated many times, the same key queried over
and over. The cache turns "once per call" into "once per
distinct argument."
Suppose we answer a batch of queries, and the batch has
duplicates. We compare mapping the raw slow_id against mapping
a freshly memoized copy (memo slow_id, evaluated once by
List.map so the whole batch shares one cache):
The plain version runs the slow body once per list element,
eight times. The memoized version runs it only for the two
distinct keys; the other six lookups are free. The speedup is
the ratio of total calls to distinct arguments, which grows as
the duplication grows. That is the everyday reason to reach for
memo.
(We write memo slow_id inside the List.map so each run
gets its own fresh cache; reusing the earlier memo_id, whose
cache already holds 10 and 20, would make the comparison
read as zero slow runs.)
The wrinkle with recursive functions
Now the more interesting case. We used naive fib above as a
black-box slow function; this time we want to memoize fib
itself. Recall its definition:
fib 25 is fine. fib 35 takes seconds. fib 40 takes minutes.
The reason: fib n calls fib (n-1) and fib (n-2); each of
those calls fib (n-2), fib (n-3), fib (n-3), fib (n-4);
the recursion tree branches and recomputes overlapping
subproblems exponentially many times.
This should be the classic case for memoization. The same arguments come up over and over; if we cached each, the total work would drop to O(n). Try the obvious:
The first call is still exponential; only the outer call to
memo_fib_outer 35 is cached. The internal recursive calls go
back to the original fib, not the memoized version. The
problem: fib's body says fib (n-1) + fib (n-2); the names
fib inside the body are bound at definition time, to the
non-memoized version.
To fix this we have to rewrite fib so the recursive call site
is abstracted out: instead of calling itself by name, it calls
a function we pass in. This is called open recursion.
val fib_open : (int -> int) -> int -> int = <fun>. The first
argument is the "recursive call" function; the second is the
input. fib_open does not call itself by name at all: where the
closed fib wrote fib (n - 1), the open version writes self (n - 1). To recover the ordinary recursive fib you would pass
fib_open a self equal to fib itself, which is exactly what
let rec does automatically.
The trick: we get to choose what self is. If self is the
memoized version, every internal call hits the cache too. That
choice is what plain memo fib could not make, and it is what
memo_rec (next) will make for us.
Tying the recursive knot
The classical trick (this is subtle; read it twice). We want a
function f such that f n = fib_open f n, but where every
call to f first checks the cache. We build it in three steps:
- Create a
refholding a dummy function. We will fill in the real one in step 3. - Define the memoized function. When it needs to recurse, it reads from the ref and calls that. Inside the body, the ref does not yet hold the real function; that is fine, because the read happens when the function is called, not when it is defined.
- Update the ref to point at the memoized function we just built.
val memo_rec : (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun>.
Reading it carefully:
dummyis a placeholder that crashes if called;self_refstarts pointing at it. By the time anything actually calls through the ref, we will have replaced it (step 3).f_memoismemoapplied to a function that takesx, looks up the current value ofself_ref, and callsf_openwith that as the recursive function. The!self_refdereference happens at call time, by which point the ref has been updated.- The last step writes
f_memoitself into the ref. Now!self_refreturns the memoized function, and the open recursion inf_openbecomes a recursion through the cache.
The ref is being used to tie a recursive knot between two
definitions that cannot be written with plain let rec (because
memo does not have the right shape for let rec to recurse
through it).
It looks like a magic trick the first time you see it. The two things to keep straight:
- Open recursion is a rewrite. Instead of
fibcalling itself by name, it takes its own recursive call as a parameter (self). This is a mechanical edit. - The ref is the trick. OCaml's
let reccannot tie the knot for us becausememosits in the middle, andmemois not transparent tolet rec. The ref lets us refer to the final function from inside the function's body, after the fact.
Memoized Fibonacci
The payoff:
This one is cheap enough to run live on the page. fib_memo 30
is fast: the recursive structure visits each
sub-Fibonacci once, caches it, and the work collapses to O(n).
The repeated call is faster still: every node is already cached.
fib_memo 35 is also fast: it only computes the few nodes not
already in the cache (35, 34, 33, ..., 31), then reuses the
sub-results from the earlier call.
Compare against the naive fib: where the naive function
exploded at n = 35, the memoized one is fast even for n in
the hundreds (subject to integer overflow: fib exceeds the
native 63-bit int around n = 90, and the browser's 32-bit
int much sooner, around n = 45).
A dynamic-programming case: edit distance
Memoization is the engine behind most of dynamic programming. The name "dynamic programming", by the way, means nothing: its inventor Richard Bellman admitted in his autobiography that he chose it at RAND in the 1950s because the Secretary of Defense, Charles Wilson, was openly hostile to "research", and Bellman needed a name "not even a Congressman could object to". "Dynamic" sounded impressive, "programming" meant planning, and the funding survived. The technique, as you are about to see, is just memoized recursion (Dreyfus's retelling has the full story).
Edit distance (also known as Levenshtein distance) between two strings is the minimum number of single-character insertions, deletions, or substitutions needed to turn one into the other. The recurrence:
\[ d(s, t) = \begin{cases} |t| & \text{if } |s| = 0 \\ |s| & \text{if } |t| = 0 \\ \min \{ d(s', t)+1,\ d(s, t')+1,\ d(s', t') + c \} & \text{otherwise} \end{cases} \]
where s' and t' are s and t with their last characters
dropped, and c is 0 if the last characters of s and t
agree (no substitution) or 1 otherwise.
A direct translation:
edit_dist ("kitten", "sitting") is 3. But try
edit_dist ("kitten 4.08", "sitting 4.08") and the call takes
forever: the recursion tree branches three ways and revisits
the same prefix pairs over and over.
The same fix from fib. Rewrite in open-recursion form, then
hit with memo_rec:
Both runs now finish quickly. The memoized version computes
each (s', t') pair at most once; the total work is proportional
to the number of distinct sub-pairs, which is O(|s| * |t|). This
is exactly the dynamic-programming table you may have seen
filled in row by row in an algorithms class. The recursive
formulation plus memoization gives the same complexity without
the bookkeeping.
Memoization presumes purity
A caveat that ties memoization to laziness and to functional programming generally. The whole trick (cache the answer; on a repeat call, return the cached answer) is only sound if the function is pure: same input, same output, no side effects.
What happens if we memoize a function with side effects?
Without memoization, next () returns 1, 2, 3, ... on
successive calls. Suppose we memoize it. Hashtbl.find_opt
would key on (); the first call returns 1 and stashes it;
every subsequent call returns the cached 1 without running
the body. The side effect of incrementing the counter is lost
on every cache hit.
Or a function that reads a file:
let read_config () =
(* read and parse config.json from disk *)
...
Memoizing this freezes the answer the first time you call it. If the file changes on disk later, the cached version does not notice. That might be what you want (a deliberate "cache this once and don't re-read") or it might be a bug (you wanted to pick up changes). The point is that the cache cannot tell the difference, because the function's input did not change; only the world did.
The same caveat applies to lazy values from the
previous lecture. A
Lazy.t runs its body once and caches the result; if the body
has side effects, those run once and then never again. We saw
this on the slide for lazy (print_endline "running"; ...): the
print happened only on the first force.
OCaml does not track purity in the type system. (Haskell does; that is one of its defining features.) Memoizing in OCaml is a you-the-author judgement: you have to know your function is pure before you wrap it. The compiler will not catch a memoized side-effecting function for you; the program will just behave strangely.
A quick check
What does memo do on the first call with a new argument?
- Skips the call body and returns a default.
- Runs the body, stashes the result in the cache, returns it.
- Throws because the cache is empty.
- Computes the result twice for safety.
Why: A cache miss (the None branch of find_opt) runs
the wrapped function, stores the result in the hashtable, and
returns it. Subsequent calls with the same argument hit the
cache and skip the body.
Why does let memo_fib = memo fib not speed up the recursive
fib?
memoonly works on non-recursive functions.- The hashtable cannot hold integer keys.
- The recursive calls inside
fib's body refer to the originalfib, not the memoized one. memoresets the cache on every call.
Why: fib's body says fib (n - 1) + fib (n - 2). Those
internal fibs are bound at definition time to the unmemoized
function; only the outermost call goes through the cache. To
fix this we rewrite fib in open-recursion form (taking self
as a parameter) and use memo_rec to tie the knot.
Activity
Fill in the open-recursion binomial and the memoized version.
Keep the argument as a tuple (n, k) so the hashtable can key on
the pair.
Show reference solution
Reference solution:
The naive recurrence has overlapping subproblems: C(n-1, k-1)
and C(n-1, k) both recurse into C(n-2, k-1). memo_rec
collapses the duplicates to one call per (n, k) pair.
Show reference solution
What's next
That closes this module's small detour through laziness, streams, and memoization. The next lecture starts the second half of the module: OCaml's module system, the unit of code organisation at scale. Modules group related definitions, provide namespacing, and (with signatures) hide internals. The standard library you have been using all course is itself a tree of modules; we finally meet the language feature that builds it.
Reading
- Cornell CS3110, Memoization: https://cs3110.github.io/textbook/chapters/adv/memoization.html
- Real World OCaml, Memoization and dynamic programming: https://dev.realworldocaml.org/imperative-programming.html#scrollNav-4
Sources
This lecture follows the structure and worked examples of
IIT Madras CS3100, Streams, Laziness and Memoization
(KC Sivaramakrishnan, Monsoon 2020). The memo combinator,
the memo_rec knot-tying construction, and the Fibonacci /
edit-distance demonstrations are drawn directly from the source
lecture; prose is freshly authored for the NPTEL format. The
edit-distance code is adapted from Real World OCaml's
Imperative programming chapter (which itself credits the
standard dynamic-programming presentation).