Tutorial: an interpreter for the OCaml AST

Functional Programming with OCaml

Tutorial: an interpreter for the OCaml AST

Module 5 · Lecture 6

KC Sivaramakrishnan
IIT Madras

In the AST tutorial you built an algebraic data type for a tiny subset of OCaml: integer and boolean literals, addition, variables, and let ... in. We constructed example trees but never walked them. This lecture closes the loop. We will write three walkers over the same AST: a pretty printer, a depth function, and the centrepiece, an interpreter that evaluates an expression to a value. The interpreter is the natural home for almost everything we have seen in Module 5, from basic patterns to or-patterns to exhaustiveness.

We will extend the OCaml AST with one new constructor, If, so that the existing Bool constructor has somewhere to flow. Everything else stays as it was. The interpreter uses nothing beyond the language features introduced in Modules 1 through 5: pattern matching, recursion, option, lists. No higher-order functions, no exceptions, no modules.

This tutorial

The type, extended with If

The OCaml AST's final form from the previous tutorial, with If (cond, then, else) added so that Bool actually does something. The annotation on Let_in is ty option to preserve the make-illegal-states-unrepresentable choice from the AST tutorial: programs either supplied a type or did not.

type ty = | T_int | T_bool type expr = | Int of int | Bool of bool | Add of expr * expr | If of expr * expr * expr | Var of string | Let_in of string * ty option * expr * expr

Six constructors. Three of them carry payloads with sub- expressions (Add, If, Let_in), so they are recursive. The other three (Int, Bool, Var) are leaves at the AST level: they carry data but no sub-expressions. Every walker will have one clause per constructor, three of which recurse.

The AST

type ty = T_int | T_bool type expr = | Int of int | Bool of bool | Add of expr * expr | If of expr * expr * expr | Var of string | Let_in of string * ty option * expr * expr

A small example program to test our walkers on. In OCaml syntax:

let x : int = 10 in
if true then x + 5 else 0

The same in our AST:

let example = Let_in ("x", Some T_int, Int 10, If (Bool true, Add (Var "x", Int 5), Int 0))

Every constructor of expr appears at least once. The program should pretty-print back to something resembling the source, have a small finite depth, and evaluate to 15.

The example program

let x : int = 10 in
if true then x + 5 else 0

as an expr value:

let example = Let_in ("x", Some T_int, Int 10, If (Bool true, Add (Var "x", Int 5), Int 0))

Function 1: pretty

A pretty printer turns an expr back into a string. We will parenthesise every internal node so we never have to worry about operator precedence. Two clauses are interesting: If prints all three sub-expressions; Let_in has to decide whether to emit the optional type annotation.

let pretty_ty t = match t with | T_int -> "int" | T_bool -> "bool" let rec pretty e = match e with | Int n -> string_of_int n | Bool b -> string_of_bool b | Add (e1, e2) -> "(" ^ pretty e1 ^ " + " ^ pretty e2 ^ ")" | If (c, t, f) -> "(if " ^ pretty c ^ " then " ^ pretty t ^ " else " ^ pretty f ^ ")" | Var x -> x | Let_in (x, opt_ty, e1, e2) -> let annot = match opt_ty with | None -> "" | Some ty -> " : " ^ pretty_ty ty in "(let " ^ x ^ annot ^ " = " ^ pretty e1 ^ " in " ^ pretty e2 ^ ")" let _ = pretty example (* = "(let x : int = 10 in (if true then (x + 5) else 0))" *)

The Let_in clause uses an inner match on the ty option: present or absent, decided once, plugged back into the surrounding string. This is the pattern-inside-a-constructor shape, used here at the sub-result level rather than at the outer pattern.

pretty: one clause per constructor

let pretty_ty t = match t with T_int -> "int" | T_bool -> "bool" let rec pretty e = match e with | Int n -> string_of_int n | Bool b -> string_of_bool b | Add (e1, e2) -> "(" ^ pretty e1 ^ " + " ^ pretty e2 ^ ")" | If (c, t, f) -> "(if " ^ pretty c ^ " then " ^ pretty t ^ " else " ^ pretty f ^ ")" | Var x -> x | Let_in (x, opt_ty, e1, e2) -> let annot = match opt_ty with | None -> "" | Some ty -> " : " ^ pretty_ty ty in "(let " ^ x ^ annot ^ " = " ^ pretty e1 ^ " in " ^ pretty e2 ^ ")"

pretty example

let _ = pretty example (* = "(let x : int = 10 in (if true then (x + 5) else 0))" *)

Function 2: depth

depth returns the height of the AST. The three leaf constructors all return 1; we can fold them into a single clause with an or-pattern, since they share the same right-hand side.

let rec depth e = match e with | Int _ | Bool _ | Var _ -> 1 | Add (e1, e2) -> 1 + max (depth e1) (depth e2) | If (c, t, f) -> 1 + max (depth c) (max (depth t) (depth f)) | Let_in (_, _, e1, e2) -> 1 + max (depth e1) (depth e2) let _ = depth example (* = 4 *)

The or-pattern Int _ | Bool _ | Var _ collapses three clauses into one. The depth of example is 4: Let_in at the top, If underneath, Add underneath that, Var (or Int) at the bottom.

depth: or-pattern across the leaves

let rec depth e = match e with | Int _ | Bool _ | Var _ -> 1 | Add (e1, e2) -> 1 + max (depth e1) (depth e2) | If (c, t, f) -> 1 + max (depth c) (max (depth t) (depth f)) | Let_in (_, _, e1, e2) -> 1 + max (depth e1) (depth e2)

depth example

let _ = depth example (* = 4 *)

Function 3: eval

eval is the heart of the tutorial. Given an expression and an environment mapping variable names to values, return the value the expression reduces to.

A value is either an integer or a boolean. We model this with a new variant; we cannot reuse OCaml's own int or bool directly, because the same eval has to return both kinds of result depending on the expression. An environment is a list of (name, value) pairs:

type value = | VInt of int | VBool of bool type env = (string * value) list

eval can fail. If we ask for Add (Bool true, Int 1) the expression is type-incorrect and we have no answer. We have not covered exceptions yet (Module 7), so we return value option: Some v on success, None on any runtime mismatch. That matches the make-illegal-states-unrepresentable slogan from Module 4: the type forces the caller to handle the failure.

We also need to look up a variable in the environment. A plain recursive function over the list; returns None if the name is absent:

let rec lookup x (e : env) = match e with | [] -> None | (k, v) :: rest -> if x = k then Some v else lookup x rest

eval: values, environment, and lookup

type value = VInt of int | VBool of bool type env = (string * value) list let rec lookup x (e : env) = match e with | [] -> None | (k, v) :: rest -> if x = k then Some v else lookup x rest

Now eval itself. One clause per constructor of expr. The recursive cases use nested matches on the sub-results to propagate None and to enforce the expected value shape (an Add needs two VInts, an If needs a VBool condition).

let rec eval (env : env) e = match e with | Int n -> Some (VInt n) | Bool b -> Some (VBool b) | Add (e1, e2) -> (match eval env e1, eval env e2 with | Some (VInt a), Some (VInt b) -> Some (VInt (a + b)) | _ -> None) | If (c, t, f) -> (match eval env c with | Some (VBool true) -> eval env t | Some (VBool false) -> eval env f | _ -> None) | Var x -> lookup x env | Let_in (x, _, e1, e2) -> (match eval env e1 with | Some v -> eval ((x, v) :: env) e2 | None -> None) let _ = eval [] example (* = Some (VInt 15) *)

Reading the clauses:

eval [] example returns Some (VInt 15).

eval: the easy cases

let rec eval (env : env) e =
  match e with
  | Int n  -> Some (VInt n)
  | Bool b -> Some (VBool b)
  | Add (e1, e2) ->
    (match eval env e1, eval env e2 with
     | Some (VInt a), Some (VInt b) -> Some (VInt (a + b))
     | _ -> None)
  | ...

eval: If, Var, Let_in

  | If (c, t, f) ->
    (match eval env c with
     | Some (VBool true)  -> eval env t
     | Some (VBool false) -> eval env f
     | _ -> None)
  | Var x -> lookup x env
  | Let_in (x, _, e1, e2) ->
    (match eval env e1 with
     | Some v -> eval ((x, v) :: env) e2
     | None   -> None)

eval, assembled

let rec eval (env : env) e = match e with | Int n -> Some (VInt n) | Bool b -> Some (VBool b) | Add (e1, e2) -> (match eval env e1, eval env e2 with | Some (VInt a), Some (VInt b) -> Some (VInt (a + b)) | _ -> None) | If (c, t, f) -> (match eval env c with | Some (VBool true) -> eval env t | Some (VBool false) -> eval env f | _ -> None) | Var x -> lookup x env | Let_in (x, _, e1, e2) -> (match eval env e1 with | Some v -> eval ((x, v) :: env) e2 | None -> None) let _ = eval [] example (* = Some (VInt 15) *)

More examples

Now that eval is defined, let us run it on a few small expressions. For each one we show the pretty output (so you can read the program as surface syntax) and the eval result.

let ex1 = Add (Int 2, Int 3) let _ = pretty ex1 (* = "(2 + 3)" *) let _ = eval [] ex1 (* = Some (VInt 5) *) let ex2 = Let_in ("x", None, Int 10, Add (Var "x", Int 1)) let _ = pretty ex2 (* = "(let x = 10 in (x + 1))" *) let _ = eval [] ex2 (* = Some (VInt 11) *)

Let_in extends the environment with ("x", VInt 10) :: [], then evaluates the body under that env. Var "x" finds it via lookup.

let ex3 = If (Bool true, Int 42, Add (Int 1, Int 1)) let _ = pretty ex3 (* = "(if true then 42 else (1 + 1))" *) let _ = eval [] ex3 (* = Some (VInt 42) *)

The condition Bool true evaluates to VBool true, so the then-branch wins; the else-branch is never touched.

eval on a few small programs

let ex1 = Add (Int 2, Int 3) let _ = eval [] ex1 (* = Some (VInt 5) *) (* pretty: (2 + 3) *) let ex2 = Let_in ("x", None, Int 10, Add (Var "x", Int 1)) let _ = eval [] ex2 (* = Some (VInt 11) *) (* pretty: (let x = 10 in (x + 1)) *) let ex3 = If (Bool true, Int 42, Add (Int 1, Int 1)) let _ = eval [] ex3 (* = Some (VInt 42) *) (* pretty: (if true then 42 else (1 + 1)) *)

When eval returns None

The interpreter is total in the sense that it never raises an exception, but it does report failure with None. There are two ways for an expr to fail at runtime: a value of the wrong kind in an operator position, and a Var whose name is unbound in the current environment.

let bad1 = Add (Bool true, Int 1) let _ = pretty bad1 (* = "(true + 1)" *) let _ = eval [] bad1 (* = None *)

Add expects two VInt payloads, and the first sub-expression evaluates to a VBool. The nested match inside the Add clause falls through to _ -> None.

let bad2 = Var "y" let _ = pretty bad2 (* = "y" *) let _ = eval [] bad2 (* = None *)

lookup "y" [] walks the empty environment and returns None, which propagates to the top.

let bad3 = If (Int 1, Int 42, Int 0) let _ = pretty bad3 (* = "(if 1 then 42 else 0)" *) let _ = eval [] bad3 (* = None *)

The condition is an integer, not a boolean. The nested match inside the If clause falls through to _ -> None.

When eval returns None

let bad1 = Add (Bool true, Int 1) (* "(true + 1)" *) let _ = eval [] bad1 (* = None *) let bad2 = Var "y" (* "y" *) let _ = eval [] bad2 (* = None *) let bad3 = If (Int 1, Int 42, Int 0) (* "(if 1 then 42 else 0)" *) let _ = eval [] bad3 (* = None *)

Two looming questions

There are two things to be uncomfortable about with this interpreter, and both have proper answers in Module 8.

Why so much None-bookkeeping? Look at every recursive clause: Add, If, Let_in. Each one does the same dance: evaluate a sub-expression, pattern-match on the Some _ / None result, propagate None if it appears. The clauses are near-copies of each other with different right-hand sides. This boilerplate is exactly what the option monad captures, with a let* operator that compresses each nested match to one line. We take this same interpreter and rewrite it with let* in the option-monad lecture. The shape stays the same; the noise disappears.

Why can eval fail at all? Look at bad1: Add (Bool true, Int 1) is a well-typed OCaml value of type expr, but it represents a program that makes no sense. The OCaml type system cannot see this, because our expr lumps integer expressions and boolean expressions together. GADTs, in the GADT lectures, let you index expr by what it produces (int vs bool). With that indexing, the type system rules out Add (Bool true, _) at compile time, and the option return type disappears entirely.

For Module 5, we accept the boilerplate and the runtime failure as a fact of life. Both get repaired by the end of Module 8.

Two looming questions

The meta-pattern

Three different walkers, one shared skeleton. Each function has exactly one clause per constructor. The base cases are the leaves; the recursive cases recurse on the sub-expressions and combine the results.

Every walker on expr has the same shape

let rec f e =
  match e with
  | Int _ -> <leaf answer>
  | Bool _ -> <leaf answer>
  | Add (e1, e2) -> <combine f e1 and f e2>
  | If (c, t, f') -> <combine f c, f t, f f'>
  | Var _ -> <leaf answer>
  | Let_in (_, _, e1, e2) -> <combine f e1 and f e2>

The type definition gives the template; you fill in the actions. The compiler will complain if you skip a constructor. After Module 5 this should be muscle memory: "I have a recursive ADT; my function has one clause per constructor; I recurse on the sub-expressions and combine."

Two checks

question: | Why does eval return value option rather than value? options:

question: | In the Add case, why do we match on the pair (eval env e1, eval env e2) instead of nesting two separate matches? options:

Extend expr with a new constructor Sub of expr * expr and complete eval's Sub clause. The starter redefines expr with Sub and repeats eval with the Sub clause left as a TODO; the chapter's ty, value, env, and lookup are still in scope.

type expr = | Int of int | Bool of bool | Add of expr * expr | Sub of expr * expr | If of expr * expr * expr | Var of string | Let_in of string * ty option * expr * expr let rec eval (env : env) e = match e with | Int n -> Some (VInt n) | Bool b -> Some (VBool b) | Add (e1, e2) -> (match eval env e1, eval env e2 with | Some (VInt a), Some (VInt b) -> Some (VInt (a + b)) | _ -> None) | Sub (e1, e2) -> failwith "TODO: the Sub clause" | If (c, t, f) -> (match eval env c with | Some (VBool true) -> eval env t | Some (VBool false) -> eval env f | _ -> None) | Var x -> lookup x env | Let_in (x, _, e1, e2) -> (match eval env e1 with | Some v -> eval ((x, v) :: env) e2 | None -> None)
Show reference solution

The Sub clause mirrors Add with the operator swapped: evaluate both children, require two VInts, rebuild with a - b, and collapse everything off-shape to None.

Activity

The point of the activity is to see the refactoring-with-the-compiler loop end to end. Add two constructors. Build. Read the warnings. Update each match site.

Activity

Add Sub of expr * expr and Mul of expr * expr to expr.

Run the build before reading on. The compiler should report three sites that need attention.

Show reference solution

Activity solution: which matches need updating

Adding Sub and Mul:

type expr = | Int of int | Bool of bool | Add of expr * expr | Sub of expr * expr | Mul of expr * expr | If of expr * expr * expr | Var of string | Let_in of string * ty option * expr * expr
  • pretty, depth, eval: each gets warning 8, naming Sub _ and Mul _ as missing.
  • lookup is untouched: it walks an env, not an expr.
  • The compiler points at every match site, not at every file.

Activity solution: depth and eval

(* depth: extend the binary or-pattern *)
| Add (e1, e2) | Sub (e1, e2) | Mul (e1, e2) ->
    1 + max (depth e1) (depth e2)

(* eval: one clause per new operator *)
| Sub (e1, e2) ->
  (match eval env e1, eval env e2 with
   | Some (VInt a), Some (VInt b) -> Some (VInt (a - b))
   | _ -> None)
| Mul (e1, e2) ->
  (match eval env e1, eval env e2 with
   | Some (VInt a), Some (VInt b) -> Some (VInt (a * b))
   | _ -> None)
  • depth's or-pattern grows naturally: same right-hand side, three alternatives.
  • eval's Sub and Mul are near-copies of Add with the operator swapped. The repetition is real; we tame it in Module 6.

What's next

This tutorial closes Module 5. You have seen pattern matching on flat values (Lecture 1), on recursive structures (Lecture 2), inside other patterns (records, inline records, the diagonal idiom, or-patterns) (Lecture 3), constrained by guards (Lecture 4), and under the exhaustiveness checker (Lecture 5). The interpreter above brings all of those together on a single recursive ADT.

What you should know after Module 5

Common pitfalls of interpreters

A handful of mistakes catch almost everyone the first time they write an interpreter.

Common pitfalls of interpreters

Reading

Sources

The AST in this lecture is the one we built in the AST tutorial, extended with a single If constructor. The interpreter, its environment representation, and the worked walkers (pretty, depth, eval) are original to this course. The "refactoring with the compiler" idea is folklore in the OCaml/SML community and is also discussed in Cornell CS3110, chapter on pattern matching.