Literals
Expressions
Almost every meaningful piece of an OCaml program is an
expression: a piece of syntax that the language evaluates to
produce a value. (Top-level declarations like let x = e bind
names; the interesting part of each declaration is the expression
e on the right-hand side.) An expression has two things,
syntax (how you write it) and semantics (what it means). The
semantics in turn split in two:
- Static semantics are the type-checking rules. Before any evaluation happens, OCaml checks the expression and either produces a type or rejects it with an error message.
- Dynamic semantics are the evaluation rules. If type-checking succeeded, OCaml then evaluates the expression to produce a value (or it raises an exception, or it runs forever).
Crucially, evaluation rules apply only to expressions that type-check. This is the static-vs-dynamic-language line: statically typed languages like OCaml refuse to run an ill-typed expression, where dynamically typed languages (Python, JavaScript) start running anyway and may discover the type mismatch only at runtime.
Values
A value is an expression that does not need any further
evaluation. The literal 5 is already a value: there is nothing
left to compute. The expression 2 + 3 is not a value: it still
has work to do, namely the dynamic semantics of +, which reduces
it to 5. Every successful evaluation in OCaml ends at a value.
Values form a subset of expressions: every value is an expression, but not every expression is a value. Evaluation is the process of taking a non-value expression and reducing it to one inside that inner ring.
Literals
The simplest expressions are the ones that are already values:
they need no evaluation at all. We call those literals. OCaml's
primitive literal kinds are int, float, bool, char,
string, and unit. This lecture spends the bulk of its time on
the four that dominate everyday code: integers, floating-point
numbers, booleans, and strings. char (a single byte, written
'a') shows up briefly in the strings section, and unit (the
single value (), used as a placeholder when there is nothing
meaningful to return) was introduced in
the hello-world lecture
and returns in Module 3 alongside unit-taking functions. The
choice of "primitive" is the language
designer's: these are the kinds the compiler knows about
intrinsically, with dedicated syntax and built-in operators.
Every other value in the language, from a list of pairs to a
record of records, is ultimately built up out of literals like
these.
It is tempting to skip past this material as obvious; you have
written ints and strings in five other languages already. Resist
that urge. OCaml makes a number of deliberate choices in this corner
of the language that are different from what C, Python, or Java do,
and the reasons behind those choices reveal a lot about how the rest
of OCaml works. Why does 1 + 2.0 fail to compile? Why is there a
separate ^ for string concatenation when + was right there? Why
does = mean something different from ==? Each of these has an
answer that we will use again, in much heavier form, when we look
at functions, modules, and the type system.
If you are watching the video, the slides are the spine; the prose below is what you read once the video is over, the way you would read a textbook chapter on the same material.
The first row of that table, int, deserves a longer look. The
others are mostly variations on what you already expect, but OCaml's
integers come with one small surprise that explains a lot of design
choices later in the language.
Integers
OCaml's int is a machine integer: a fixed-width signed integer
that fits in a single register on whatever CPU you are running. On
the 64-bit machines that essentially every student has today, that
register is 64 bits wide. You might therefore expect OCaml's int
to give you the full 64-bit range, from -2^63 up to 2^63 - 1.
It does not. OCaml's int is 63 bits wide, with a range of about
-4.6 × 10^18 to 4.6 × 10^18.
Where did the missing bit go? The runtime stole it. OCaml needs a
way, at runtime, to tell an immediate integer apart from a pointer
to a heap-allocated object. The garbage collector, in particular,
has to walk every value the program is holding and decide whether
to follow it as a pointer or treat it as an inline scalar. The
trick OCaml uses is to set the low bit of every immediate integer
to 1, and arrange the heap so that every pointer is even (its low
bit is 0). One bit-test then suffices to classify any word in
memory. That stolen low bit costs us one bit of integer range, but
it makes the GC fast and predictable, and it is part of why OCaml
programs can run within a small constant factor of equivalent C
code.
You will not see this tagging in your code: it happens entirely at
the runtime level. The only place it surfaces for the programmer
is the slightly narrower int range. If you really need a true
64-bit integer, the standard library has a separate Int64 module;
for arbitrary-precision integers, there is the
Zarith library. For the first
half of this course we will not need either.
OCaml lets you write integer literals in four bases. The default is
decimal, but a 0x prefix denotes hexadecimal, 0o denotes octal,
and 0b denotes binary. The compiler reads all four and produces
the same int value.
The underscore in 1_000_000 is a small but worthwhile convention:
it is purely visual, the compiler discards it entirely, and it
makes large numeric constants enormously easier to read. The same
convention exists in Java 7+, Python 3.6+, and Rust. You can place
the underscores wherever you like; 1_0_0_0_0_0_0 is also a million,
just an unkind one to your reader. The most common groupings are by
three digits for decimal numbers and by bytes for hex masks.
The four integer operators that come built in are +, -, *,
/, and mod. The first three behave exactly as you expect. The
last two, / and mod, have one subtlety worth understanding now,
because it differs from Python (the language students most often
arrive in OCaml from).
The / operator on int is truncating integer division: it
divides exactly and then throws the fractional part away. So
17 / 5 is 3, with the 0.4 discarded. The companion mod
operator returns what was discarded, scaled to an integer: 17 mod 5
is 2, because 17 = 3 * 5 + 2. The identity a = (a / b) * b + (a mod b)
holds for any positive a and b.
For negative operands, OCaml truncates toward zero, not toward
minus infinity. So (-17) / 5 is -3 in OCaml. Python, in
contrast, rounds toward minus infinity, so the same division in
Python gives -4. There is no universal right answer here; both
languages picked a convention and stuck with it. The OCaml
convention matches C and Java; the Python convention is mathematically
cleaner for some applications. The practical advice is: if you find
yourself doing arithmetic on signed integers near zero, write the
answer out for a couple of inputs and check that you have the
convention you wanted.
Integer overflow in OCaml is silent. max_int + 1 does not raise
an exception or produce a runtime error; it wraps around, and the
language defines that wrap-around as the result. This is a
deliberate choice for performance. C is different in a way worth
flagging: signed-integer overflow in C is undefined behaviour,
so the compiler may assume it never happens (a distinction a later
module returns to). If you are doing arithmetic where overflow
might happen and would matter, the discipline is: use a wider
type (Int64) or check explicitly.
The stakes of "check explicitly" can be absolute. On 4 June 1996, the maiden Ariane 5 rocket tore itself apart 37 seconds after launch, taking roughly 370 million dollars of vehicle and satellites with it. The guidance software, reused from Ariane 4, converted a 64-bit float (the horizontal velocity) into a 16-bit integer; on Ariane 5's faster trajectory the value no longer fit, and the range check had been deliberately omitted for performance, because analysis of the old rocket had shown the overflow could never happen. The conversion trapped, both redundant guidance units ran the same code and failed the same way, and the rocket veered and broke up. Numeric conversions are where two representations meet; the inquiry report is a classic precisely because every ingredient was reasonable on its own.
Floating-point numbers
OCaml's float is exactly IEEE 754 double precision: 64 bits, with
1 sign bit, 11 exponent bits, and 52 fraction bits. This is the same
representation that C calls double, that JavaScript uses for all
numbers, and that almost every modern language uses for its default
floating-point type. The range is roughly ±10^308, with about 15
to 17 significant decimal digits of precision.
There is one small but very firm syntactic rule: a float literal
must contain a decimal point. Without it, the compiler reads the
number as an int. So 3. and 3.0 and 3.14 are all floats;
3 is an integer. The trailing dot after 3. is enough, even
without a digit after it.
Scientific notation works as it does in every other language:
2.71828e-1 is 2.71828 × 10^(-1), which is 0.271828. You can
write the exponent with a sign (e-1, e+5) or without (e10).
The rule is that either a decimal point or an exponent suffix
is enough to mark a literal as float. So 1.0, 1.0e10, and
1e10 are all floats; only 1 is an int. In practice, prefer
the decimal point: it makes the code easier to read.
Now to the design choice that catches every new OCaml programmer at
least once: the arithmetic operators on float are different
symbols from the ones on int. Floating-point addition is +.,
not +. Subtraction is -.. Multiplication is *.. Division is
/.. The trailing dot is part of the operator name, just like the
underscore in let_binding would be part of an identifier name.
This is the design choice that, in my experience, catches the
greatest number of students. After ten minutes of writing OCaml,
someone will try to write let area r = 3.14 * r * r and the
compiler will refuse: The constant 3.14 has type float but an expression was expected of type int. The fix is to write
3.14 *. r *. r instead.
Why? Why not let + do the obvious thing depending on its
operands, the way Python and Java and JavaScript do? The answer
has two parts, one practical and one principled, and both worth
holding onto.
The practical part is that operator overloading is genuinely
expensive. In C++, when the compiler sees a + b, it has to
search for an operator+ that takes the types of a and b. If
several such operators are in scope, it has to apply overload
resolution rules to pick one. This makes both compilation slower
and error messages worse: a misplaced + can produce error
messages that talk about candidate overloads in libraries the
programmer has never heard of. Languages with simpler type systems,
like C and Java, get around this by baking the overloads into the
compiler: the compiler knows that + on two ints is one
instruction, on two doubles is a different one, and on a String
and anything is yet another. That is a workable design, but it
means you cannot decide for yourself, in your own code, what +
means on a new type you have written. OCaml takes the opposite
position: every operator has one meaning, fixed in the language,
and that meaning is determined by the operator symbol alone, not
by the types of its operands.
The principled part is reasoning. When you read OCaml code and
see a + b, you know, without checking anything else, that both
a and b are integers, and that the result is an integer add.
When you see a +. b, you know both are floats. That is one less
thing to verify in your head as you read code. We will come back
to this principle several times in the course: OCaml repeatedly
chooses more syntax, less ambiguity, and the dividend shows up
when you have to read someone else's code six months later.
The cost of this choice is that mixing numeric types requires an
explicit conversion. The function float_of_int turns an int
into a float; int_of_float does the reverse, truncating. The
opposite-direction conversion is so common in numerical code that
the standard library also exposes them as Float.of_int and
Float.to_int, with friendlier names.
One more property of float that is worth flagging now, because
students rediscover it the hard way: floating-point arithmetic is
approximate. The number 0.1 cannot be represented exactly in
binary floating point; neither can 0.2. So 0.1 +. 0.2 does
not give 0.3; it gives 0.300000000000000044. This is not
a bug in OCaml; it is a fundamental property of IEEE 754, and
the same anomaly appears in Python, Java, JavaScript, and
essentially every mainstream language. We saw the same example in
the Module 1 tutorial's float-precision aside.
If you have not yet encountered the basics of floating-point
representation, the classic short guide is
What Every Computer Scientist Should Know About Floating-Point
Arithmetic;
we will not need it again in this course, but it is worth knowing
it exists.
That 0.1 has no exact binary representation sounds like
trivia until it accumulates. On 25 February 1991, during the
Gulf War, a Patriot air-defence battery in Dhahran failed to
intercept an incoming Scud missile; 28 soldiers died in the
strike. The system counted time in tenths of a second, and each
stored 0.1 carried a tiny binary representation error; after
100 hours of continuous operation the clock had drifted by 0.34
seconds, which at Scud speeds moved the tracking gate more than
half a kilometre off the target. The
GAO investigation
traced the failure to exactly the arithmetic on this page.
Booleans
The bool type has exactly two values: true and false. There
is no concept of "truthy" values like Python's 0 or ""; an if
or && or || expects a bool, full stop. A 0 is an int, not
a bool, and the compiler will reject if 0 then ... outright. As
with the numeric operators, this is OCaml again preferring more
syntax over more ambiguity: when you read if e then ..., you know
e evaluates to one of two values, not to an arbitrarily-typed
value with one of seven possible truthiness rules.
The boolean operators are && for conjunction, || for disjunction,
and not for negation. The familiar comparison operators =, <>,
<, <=, >, >= all return bool.
Both && and || short-circuit, exactly as in C and Java:
&& evaluates its right argument only if the left was true, and
|| evaluates its right only if the left was false. This lets
you safely write things like x <> 0 && y / x > 1: the division
is only attempted when x is nonzero. We will lean on this
behaviour later when we want to guard expensive computations.
The comparison operators (=, <>, <, <=, >, >=) all
return bool. We will look at them properly in the
operators lecture,
where the structural vs physical equality distinction also
gets its own treatment. For now: use = for equality, the way you
would use == in C.
Strings
Strings in OCaml are sequences of bytes, written between double
quotes. They are immutable: once you have built a string, you
cannot modify a byte of it without explicitly converting through
the related type bytes. Most code never needs to do that, and so
treats strings as values, like integers: you build new ones from
old ones rather than mutating them in place.
A "byte string" is exactly that, a sequence of 8-bit bytes. OCaml's
string does not know about Unicode code points, or about encoding
in general. If your string contains the bytes that encode "café" in
UTF-8, then String.length reports 5 (the four ASCII letters plus
the two bytes that encode the accented "é"), not 4. For
Unicode-aware work the standard library is not enough; you reach
for an external library like uutf or uucp. Most code that just
concatenates, slices, or searches byte content does not need any of
that, and is perfectly happy with the byte view.
The escape sequences inside string literals are the same family
you have seen in C: \n for newline, \t for tab, \\ for a
literal backslash, \" for a literal double quote. There are also
two ways to write an arbitrary byte: \NNN, where NNN is a
three-digit decimal number, or \xHH, where HH is two hex digits.
You will not need these often, but they are how you embed raw bytes
in a literal.
String concatenation uses the operator ^, not +. The reason is
the same reason + does not work between int and float: in
OCaml, an operator has one meaning, fixed by the symbol. Numeric
addition is one operation; string concatenation is a different one;
they get different symbols. So "foo" ^ "bar" is "foobar". If
you want to build up a string from many pieces, the standard library
has String.concat, which takes a separator and a list of pieces
and is much faster for many small parts.
String.length takes a string and returns the number of bytes in
it. String.sub takes a string, a starting index, and a length,
and returns the substring at that range. Indexing is zero-based,
as in essentially every modern language. The standard library has
many other string functions (String.concat, String.trim,
String.split_on_char, String.uppercase_ascii, String.get for
single-character access, ...); we will meet them as needed in
later modules.
Out-of-bounds access raises an exception, Invalid_argument. We
will see how to handle exceptions
properly in Module 7; for now, just know that String.sub s i n
with i + n outside 0 .. length s is a runtime error.
Conversions between types
OCaml will not auto-convert between numeric types, between bool
and int, or between char and string. The standard library
provides explicit conversion functions wherever they make sense:
These names follow a predictable pattern: xxx_of_yyy returns an
xxx given a yyy. The functions that parse from a string
(int_of_string, float_of_string, bool_of_string) raise an
exception if the string does not represent a value of that type.
bool_of_string is the strictest of the three: it accepts exactly
"true" and "false", nothing else.
Putting it together
Here is a function that uses three of the four primitive types we have seen:
Notice three things. First, the function takes an int (the
password length) and returns a string (the label). The compiler
figured this out automatically from the function body, because
the comparisons are against int literals and the then and
else branches return string literals. We did not have to write
a single type annotation. Second, the body is a chain of nested
if-then-else expressions, and the whole chain is one
expression. This is the same point we made earlier about OCaml
being expression-based: even something that looks like a multi-way
branch is a value-producing expression you can pass to a function
or bind to a name. We give if its
own dedicated lecture later in
this module. Third, every
comparison is against the same type: len < 8, where len is
int and 8 is int, never mixing int and float.
A C programmer reading this might object that the ifs could be
rewritten as a switch. In OCaml, the equivalent of switch is
match, which we will see in
Module 5 (it is the central tool of the language). But match is
overkill for a chain of threshold comparisons like this one; the
right tool here is a nested if, the same as in any other
language.
Common pitfalls
A short collection of mistakes I see beginners make on every cohort of this course. None are deep, but each costs about half an hour to recover from if you make it for the first time mid-assignment.
Pitfall 1: mixing int and float. The error message
The constant 2.0 has type float but an expression was expected of type int
(or its mirror, when the offending literal is an int) is the most
common compile error in week 1. Look
at the surrounding code and figure out which side is supposed to
be a float. Insert a float_of_int (or int_of_float) as needed.
Resist the urge to "fix" this by sprinkling dots randomly; understand
which side wanted which type.
Pitfall 2: using == for equality. It looks like the Java
operator; it is not. Use =. The compiler will not warn you;
== is a valid operator with a perfectly valid (just unhelpful)
meaning, so your code compiles and runs and produces wrong answers.
This is a habit you have to drill out from day one.
Pitfall 3: forgetting the decimal point. let pi = 3 does
not produce a float; it produces an int named pi with value
3. Then later, when you write 2.0 *. pi, the compiler complains
about a type mismatch and you wonder why. Always write floating
constants with at least a trailing .: let pi = 3.14,
not let pi = 3. (Better still: use Float.pi from the standard
library.)
Pitfall 4: assuming string-on-string = is expensive. It is
linear in the length of the shorter string, the same as strcmp
in C, and the compiler is good about not doing redundant work.
You do not need to micro-optimise by comparing string lengths
first.
A quick check
What does this evaluate to?
float = 3.0(with an implicit cast)int = 3(the2.0is truncated)- Type error:
+expectsinton both sides;2.0is afloat. - Type error:
1should have been1.0.
Why: OCaml never inserts implicit conversions between int
and float. The operator + takes two ints and returns an
int; the second operand 2.0 is a float, so the compiler
rejects the expression with "the constant 2.0 has type float but
an expression was expected of type int." Both the int-side and
the float-side framings are wrong: there is no preferred side, the
language simply refuses the call and asks you to insert a
float_of_int (or int_of_float) where you intended.
What does this evaluate to?
int = -4(floor division)int = -3(truncation toward zero)float = -3.5exception Division_by_zero
Why: OCaml's integer division / truncates toward zero,
not toward negative infinity. (-7) / 2 = -3 (and (-7) mod 2 = -1). Python 3 and many other languages floor instead, giving
-4; the convention is flipped relative to those languages. The
result is an int because both operands are int and OCaml does
not implicitly promote to float.
(No code quiz here: function definitions arrive in Module 3. The Activity below stays at the "predict the type and value" level.)
Activity
The activity is a single question with two parts. Before reading
on, try it: predict the type and value of 3 / 2 and 3.0 /. 2.0.
The first, 3 / 2, has type int and value 1. Both operands
are int, so / is integer division. Integer division truncates,
so the answer is 1, not 1.5. Python 3 contrasts: there /
performs true division and would produce 1.5, even on integer
operands; you would have to write 3 // 2 to get the truncated
answer. So if you arrive in OCaml from Python 3, the convention is
flipped relative to what you are used to. In Python 2, by the
way, the convention is the same as OCaml's.
The second, 3.0 /. 2.0, has type float and value 1.5. Both
operands are float, the operator is the float-division operator,
and the answer is what you expect.
If you tried to write 3 /. 2, OCaml would refuse: the operator
/. expects float on both sides. If you tried 3.0 / 2.0, OCaml
would also refuse: the operator / expects int. There is no
operator that takes mixed types in OCaml.
What's next
Once you have literals, the immediate next question is: how do I
give them names? Repeatedly writing 3.14159 in code is not a
sustainable plan. The next lecture
covers let bindings, which let you name a value and reuse it.
They will also let us write local definitions inside an expression,
the first step toward structuring real programs.
Reading
- Real World OCaml, A Guided Tour (numbers section): https://dev.realworldocaml.org/guided-tour.html
- Cornell CS3110, Basic types and values: https://cs3110.github.io/textbook/chapters/basics/expressions.html
Sources
This lecture's prose, worked examples, and quizzes are original to
this course. Materials referenced during preparation are listed in
the Reading section above; Cornell CS3110 and Real World OCaml
are CC BY-NC-ND-licensed and have not been derivatively reused.
See LICENSES.md
at the repository root for the full source posture.