if/then/else as an expression
This lecture introduces conditionals. The construct is familiar from
every language you have written before: a way to choose between two
courses of action depending on a boolean. The OCaml version of
if/then/else looks similar at first, but it has a property
that changes how you write code: in OCaml, if/then/else is an
expression that has a value, not a statement that just runs
something. This is a small syntactic difference with a big
consequence for how programs are structured.
If you arrived from C, Java, or Python, you have written code like
"declare a variable; do an if; assign the variable in each
branch." OCaml lets you collapse that pattern: bind the variable
directly to the result of the if. The line shrinks, the
intermediate state disappears, the program becomes easier to read.
The cost is a single rule: both branches must produce values of the
same type. We will see why this rule is necessary and how to live
within it.
In C, if is a statement
In C and Java, conditionals do not have values. They are statements: blocks of code that get executed for their effects. Here is the canonical use:
int abs_val;
if (x < 0) abs_val = -x; else abs_val = x;
We declare abs_val first, then the if assigns to it. The
declaration and the assignment have to be separate, because if
is a statement: it does not produce a value you can use in the
declaration. The C compiler is also slightly nervous about this
pattern: abs_val is uninitialised at declaration, and if the
if somehow took neither branch (impossible here, but in general
it can happen), the variable would stay uninitialised. C tries to
catch this with control-flow analysis; the workaround is usually
to give the variable a sentinel default.
Some C-family languages have a ternary operator cond ? a : b
that is an expression, returning either a or b. So you can
write int abs_val = (x < 0) ? -x : x in C, Java, JavaScript. The
ternary is OCaml's if-as-expression in disguise. But the ternary
is awkward to nest, and most C programmers write the statement
form for anything beyond simple cases.
In OCaml, if is an expression
OCaml has no separate ternary operator. Instead, the ordinary
if/then/else is itself an expression with a value:
if x < 0 then -x else x is the entire body of the function. It
is an expression of type int, equal to -x when x < 0 and to
x otherwise. The function abs_val simply is that expression,
parameterised by x.
This is more than a cosmetic change. Anywhere an expression can
go, an if can go: as a function argument, as the right-hand side
of a let, inside another expression. You will use this
constantly. Examples:
The first prints "yes". The second binds nat to 7 (the
non-negative number from n). The if-expression inside the
let is fine; we did not need a separate declaration.
The shape and type rule
The abstract syntax is:
\[ \mathtt{if}\ e_1\ \mathtt{then}\ e_2\ \mathtt{else}\ e_3 \]
with three sub-expressions:
- \(e_1\) is the condition: it must have type
bool. - \(e_2\) is the then-branch: it has some type \(t\).
- \(e_3\) is the else-branch: it must have the same type \(t\).
- The whole expression has type \(t\).
Result: int = 13. The condition true evaluates to true
(itself), the then-branch fires, the value is 13.
The "both branches must have the same type" rule is the part that catches new OCaml programmers, and it is worth understanding why the rule has to be this way.
Why the branches must agree
OCaml rejects this with:
Error: The constant 13.4 has type float
but an expression was expected of type int
The branches return different types: int in one case, float in
the other. The compiler cannot assign a single type to the whole
if-expression. If it accepted the program, the type would depend
on which branch ran at runtime: dynamic, not static. This goes
against the entire point of static typing (a program's types are
known before it runs).
The fix is to bring both branches to the same type. Either both floats:
Or both ints:
The compiler will not pick for you. You decide which type you want and convert the other branch to match.
The rule generalises to anything, not just numbers. If the branches
return a string and an int, you get a type error. If they
return a list of int and a list of string, same thing. The
rule is "both branches the same type", full stop.
The typing rule, written out
Programming-languages people write typing rules with horizontal bars: the lines above the bar are premises and the line below is the conclusion.
\[ \dfrac{e_1 : \mathtt{bool} \qquad e_2 : t \qquad e_3 : t} {(\mathtt{if}\ e_1\ \mathtt{then}\ e_2\ \mathtt{else}\ e_3) : t} \]
Read as: if \(e_1\) has type bool, and \(e_2\) has type \(t\), and
\(e_3\) has type \(t\) (the same \(t\)), then the whole expression if e1 then e2 else e3 has type \(t\). This is the precise statement of
what the type checker is enforcing.
The evaluation rule comes in two halves, one per branch the condition might choose. Write \(e \to v\) for "\(e\) evaluates to \(v\)".
\[ \dfrac{e_1 \to \mathtt{true} \qquad e_2 \to v} {(\mathtt{if}\ e_1\ \mathtt{then}\ e_2\ \mathtt{else}\ e_3) \to v} \qquad \dfrac{e_1 \to \mathtt{false} \qquad e_3 \to v} {(\mathtt{if}\ e_1\ \mathtt{then}\ e_2\ \mathtt{else}\ e_3) \to v} \]
The left rule fires when the condition evaluates to true (the
then-branch supplies the value); the right rule fires when it
evaluates to false (the else-branch does). The branch that does
not fire is not evaluated; OCaml does not run dead code under
if.
You do not have to read these rules to use OCaml. They are useful notation when we need to be precise about exactly what the type checker does and what the program does. Module 4 (data types) and Module 5 (pattern matching) introduce more constructs with their own rules.
A typical use: multi-way branching
A chain of if/else if/else if/else lets you do multi-way
branching cleanly:
Result: string = "B". The chain is, formally, a deeply nested
single if-expression: if E1 then "A" else (if E2 then "B" else (if E3 then "C" else (if E4 then "D" else "F"))). OCaml allows
you to write else if to make the chain readable; semantically,
each else if is just the if-expression that the previous
else returns.
For multi-way branching on a value's structure (rather than on
threshold comparisons), the better tool is
pattern matching (Module 5). Use
if chains when you have threshold comparisons or boolean
predicates; use match when you are unpacking a value.
if without else
You can write if cond then expr with no else. The omitted
else is treated as else (), the unit value.
The function warn_if_negative : int -> unit returns unit (no
useful value, like C's void). The branches must both be unit:
the then-branch prints (returns ()), and the implicit else is
(). For non-negative inputs, nothing is printed; the function
just returns ().
Use one-armed if only for side effects (printing, mutating). For
computing a value, you need both branches: the else has to
return something of the appropriate type, and there is no sensible
default for arbitrary types.
Nested ifs
Branches can themselves be if-expressions, naturally:
As we said: else if is sugar for else (if ... then ... else ...). Either form is fine; the unparenthesised form reads more
naturally for a chain.
A quick check
Which of the following OCaml expressions has type string?
if x > 0 then "positive" else "non-positive"if x > 0 then "positive" else 0if x > 0 then "positive"if "x" > 0 then "yes" else "no"
Why: the first has both branches returning string (the
correct shape). The second mixes string and int branches:
type error. The third has no else, which OCaml treats as
else (); the then-branch would have to be unit, but it's
string: type error. The fourth has the condition "x" > 0,
which compares string to int: type error.
A code challenge:
Define max3 : int -> int -> int -> int that returns the
largest of three integers. Use only nested if/else; do not
call any library function.
Show reference solution
One shape: pick the larger of a and b first, then compare
that against c. if a > b then (if a > c then a else c) else (if b > c then b else c). The whole expression is an int because
every branch is an int.
Activity
Why does OCaml reject this?
ifcannot be used as the body of a function."positive"is not a valid OCaml string literal.- The two branches have different types (
stringandint). x > 0is not a valid boolean expression.
Why: OCaml's if-expression requires both branches to have
the same type. Here the then-branch returns "positive" (a
string) and the else-branch returns 0 (an int). There is no
single type the whole expression could have, so the compiler
rejects it. To fix: decide whether label should return string
(replace 0 with "non-positive") or int (replace "positive"
with 1).
What's next
Module 2 ends with the tutorial,
where we work through several small problems end to end, combining
literals, let, types, operators, and if. After that,
Module 3 starts on functions in
depth: anonymous functions,
recursion,
currying,
partial application,
tail recursion. The constructs
introduced in Module 3 are the workhorses of every OCaml program
you will write.
Reading
- Cornell CS3110, Conditional expressions: https://cs3110.github.io/textbook/chapters/basics/expressions.html
- Real World OCaml, A Guided Tour (if-expressions section): https://dev.realworldocaml.org/guided-tour.html
Sources
This lecture's prose, worked examples, and quizzes are original to
this course. Materials referenced during preparation are listed in
the Reading section above; Cornell CS3110 and Real World OCaml
are CC BY-NC-ND-licensed and have not been derivatively reused.
See LICENSES.md
at the repository root for the full source posture.