Portability: data-race freedom across domains

Functional Programming with OCaml

Portability: data-race freedom across domains

Module 11 · Lecture 5

KC Sivaramakrishnan
IIT Madras

The previous lecture met contention: once a value is shared across domains, its mutable fields are off-limits (no writes, and no reads either), unless the access goes through a synchronised type like an atomic.

But contention alone does not finish the data-race story. It governs how a shared value may be touched; it does not say which values become shared in the first place. That is the other half of a race, ingredient 2 from the previous lecture's four-ingredient analysis: "a shared memory location."

The portability axis closes that half. It asks "can this value safely cross a domain boundary?" If a closure would let two domains race on captured mutable state, the answer is no, and the type checker rejects any attempt to send it to another domain. The race never reaches runtime, because the program never compiles. By the end of this lecture the two axes click together into a two-step argument that makes data-race freedom a compile-time guarantee.

The other half: crossing the boundary

Where we are

The motivating example: a parallel gensym race

The cleanest motivation is a small systems chore: generating unique symbols. A sequential gensym is a one-liner: hold a counter in a ref, increment on every call, format the result. You have built this closure before: it is the ticket dispenser from the imperative half of the course, with a prefix glued on.

let gensym = let count = ref 0 in fun prefix -> count := !count + 1; prefix ^ "_" ^ string_of_int !count let _ = gensym "x" (* = "x_1" *) let _ = gensym "y" (* = "y_2" *)

Sequential, this is fine. The counter is incremented on every call; the calls cannot overlap; every result is unique.

Now move to two domains. Each domain calls gensym concurrently in a hot loop. All four ingredients from the previous lecture's recall are present:

  1. two domains running in parallel,
  2. a shared memory location: the count ref,
  3. both domains write: each call does count := !count + 1,
  4. nothing orders the accesses: no join, no lock, no atomic; count is a plain ref.

The runtime behaviour: lost updates, duplicate symbols, occasional crashes if the runtime is unlucky. Standard OCaml will compile the program and let you find out the hard way.

(* In OCaml 5 today, this compiles, and on a multicore machine it races at runtime. (This browser toplevel is single-domain, so the interleaving cannot manifest here; the point is that the vanilla API accepts the program.) *) [@@@alert "-do_not_spawn_domains"] [@@@alert "-unsafe_multidomain"] let d1 = Domain.spawn (fun () -> for _ = 1 to 1_000 do ignore (gensym "x") done) let d2 = Domain.spawn (fun () -> for _ = 1 to 1_000 do ignore (gensym "y") done) let () = Domain.join d1; Domain.join d2

The gensym race

let gensym = let count = ref 0 in fun prefix -> count := !count + 1; prefix ^ "_" ^ string_of_int !count

The portability axis

OxCaml's portability axis was named in the previous lecture; now it gets developed. Two modes:

The submoding is portable ⊑ nonportable. A portable value can be used anywhere a nonportable value is expected ("safe to ship" is a stronger promise than "not safe to ship"). The reverse is rejected.

The critical constraint, and the bridge to the previous lecture: inside a portable function, all captured values from the outer scope are treated as contended. If the function runs on another domain, the original domain might be touching the captured values at the same moment; treating the captures as contended is exactly the right level of paranoia. Capturing is fine; what the contention rules then forbid is touching the captures' mutable parts.

The portability axis

Mode Meaning
nonportable (default) Not safe to send to another domain
portable Safe to call from any domain

Submoding: portable ⊑ nonportable.

A pure closure is portable

A pure function captures nothing the rules can object to. Press Run:

let test_portable () = let (f @ portable) = fun x y -> x + y in f 1 2 let () = Printf.printf "test_portable () = %d\n" (test_portable ())

Capturing a ref and mutating it is what falls afoul. Press Run and read the error carefully; it points at the captured r inside the portable closure:

(* Press Run; the mutation of the captured ref is rejected. *) let test_nonportable () = let r = ref 0 in let (counter @ portable) () = incr r; !r in counter ()

The error says the captured r is contended (the capture rule at work), but incr needs it uncontended. The two requirements collide, so the closure cannot be portable. This is the previous lecture's Rule 1, recruited to police closures.

And the racy gensym is exactly this shape: its closure captures count : int ref and mutates it on every call. If the closure ran on another domain, the original domain might still call gensym directly. Both domains would write to count. So gensym is nonportable.

A pure closure is portable

let test_portable () = let (f @ portable) = fun x y -> x + y in f 1 2 let () = Printf.printf "test_portable () = %d\n" (test_portable ())

Capturing a mutable ref: rejected

(* Press Run; the mutation of the captured ref is rejected. *) let test_nonportable () = let r = ref 0 in let (counter @ portable) () = incr r; !r in counter ()

gensym is the same shape: it captures and mutates count. Therefore nonportable.

Domain.Safe.spawn requires portable

OxCaml's parallel primitives bake the portability discipline into their signatures. The relevant entry point is Domain.Safe.spawn. Rather than quoting the documentation, ask the toplevel itself; press Run:

let spawn = Domain.Safe.spawn (* val spawn : (unit -> 'a) @ once portable -> 'a Domain.t *)

Read the reported signature carefully. The argument is a thunk unit -> 'a, with two mode annotations. portable is this lecture's subject: the thunk must be safe to hand to another domain. And once is an old friend from the linearity lecture: the spawned thunk will be run at most one time, so the API demands no more than that. Two axes, one signature, each doing its own job.

Vanilla OCaml has Domain.spawn without this annotation, which is why the gensym race can compile today. OxCaml's Domain.Safe.spawn adds the portability constraint, and the type checker enforces it.

Try to spawn the racy gensym, and the compiler stops you:

(* OxCaml refuses this at compile time; press Run. *) let _ = Domain.Safe.spawn (fun () -> gensym "x") (* Error: The value "gensym" is "nonportable" but is expected to be "portable" because it is used inside the function ... which is expected to be "portable". *)

The error is precise: gensym is nonportable, because it captures a ref it mutates; the closure handed to Domain.Safe.spawn is expected to be portable; the constraint cannot be satisfied. No race fires because the program does not compile.

Domain.Safe.spawn's signature

let spawn = Domain.Safe.spawn (* val spawn : (unit -> 'a) @ once portable -> 'a Domain.t *) (* Press Run; OxCaml refuses the racy spawn. *) let _ = Domain.Safe.spawn (fun () -> gensym "x") (* Error: The value "gensym" is "nonportable" but is expected to be "portable" ... *)

Ruling out data races: the two-step argument

The two axes now click together. Consider the scenario the previous lecture warned about:

Domain 1                    Domain 2
--------                    --------
t.mood <- Happy             t.mood <- Sad      <- DATA RACE

The mode system prevents this with a two-step argument:

  1. Portability ensures that any closure crossing a domain boundary is portable, which means it treats its captured values as contended.
  2. Contention ensures that contended values cannot have their mutable fields read or written.

Step 1 guards the only door a value has into another domain; step 2 disarms whatever walks through it. Together, the two rules make data-race freedom a compile-time guarantee: there is no way to assemble the racy picture above out of typechecked parts.

Ruling out data races: the two-step argument

Domain 1                    Domain 2
--------                    --------
t.mood <- Happy             t.mood <- Sad      <- DATA RACE
  1. Portability: closures crossing a domain boundary are portable.
    • so they treat captured values as contended.
  2. Contention: contended values cannot have mutable fields read or written.

Two rules, one guarantee: the racy picture cannot be assembled from typechecked parts.

Captures vs parameters

A subtle but important point. Portability constrains what a closure captures from its enclosing scope. It does not force parameters into any particular mode.

A portable function can still take a parameter at @ uncontended and mutate it inside the body, because parameters are supplied fresh at each call. The function itself does not capture them; the caller hands them in. This split matters because parallel APIs hand callbacks an explicit token (a scheduler handle, a slice, a parallel-context tag), and that token can carry whatever mode the API specifies even though the callback is portable.

A small example, with the loop split out of the enclosing scope:

let factorial_portable n = let a = ref 1 in let rec (loop @ portable) (a @ uncontended) i = if i > 0 then begin a := !a * i; loop a (i - 1) end in loop a n; !a

Here loop is portable: it captures nothing from outside. It accepts a as a parameter at mode @ uncontended and mutates it freely; the caller (the outer function) holds a with full rights and passes it in. This split is the way to write portable callbacks that still need to read or write to a mutable accumulator.

Captures vs parameters

let factorial_portable n = let a = ref 1 in let rec (loop @ portable) (a @ uncontended) i = if i > 0 then begin a := !a * i; loop a (i - 1) end in loop a n; !a

The first working program: Portable.Atomic

Now the payoff: a gensym that two domains can share, with the compiler's blessing. OxCaml ships Portable.Atomic (from the portable library), an atomic that crosses both the contention and portability axes: since its operations synchronise, it can be used freely where either mode is expected. The stdlib Atomic from the previous lecture predates the mode system, and its signatures carry none of these annotations; in OxCaml code, reach for Portable.Atomic.

One packaging detail before the cell. We wrap the counter and gensym in a small module:

[@@@alert "-do_not_spawn_domains"] module Gen = struct open Portable let count = Atomic.make 0 let gensym prefix = Atomic.incr count; prefix ^ "_" ^ string_of_int (Atomic.get count) end let d = Domain.Safe.spawn (fun () -> Gen.gensym "y") let s1 = Gen.gensym "x" let s2 = Domain.join d let () = Printf.printf "%s %s\n" s1 s2 (* x_2 y_1 here: the spawned thunk ran first and drew 1. The x/y split varies with timing. *)

Two domains, a shared atomic counter, no race; the compiler verified all of that before anything ran. The toplevel reports module Gen : sig ... end @@ portable: mode inference noticed that every value inside is portable and marked the whole module portable, so Gen.gensym can be read out of it at portable mode by the spawned closure.

Why the module wrap, instead of let (gensym @ portable) = ... at the top level? A quirk of the toplevel: a bare let lands in the implicit toplevel module, which itself sits at the default nonportable mode. When Domain.Safe.spawn later reads gensym back out of that module, it sees a nonportable binding and refuses, even though the closure was verified portable at its binding site. Wrapping in module Gen gives the closure a portable home. In a real .ml file the whole compilation unit is a module that mode inference can mark portable wholesale, so this dance is not necessary.

The first working program: Portable.Atomic

module Gen = struct open Portable let count = Atomic.make 0 let gensym prefix = Atomic.incr count; prefix ^ "_" ^ string_of_int (Atomic.get count) end

Activity

We try to annotate a logging function as portable. (Predict the verdict, then press Run.)

let (f @ portable) = let log = Buffer.create 16 in fun x -> Buffer.add_string log (string_of_int x ^ "; "); x + 1

Why does the compiler refuse?

Why: Portability is determined by what a closure captures from its enclosing scope. f captures log, a mutable buffer, and Buffer.add_string writes through it. Inside a portable closure the captured log is treated as contended (the capture rule), and the contention rules forbid touching a contended value's mutable parts. The return type, the helper calls, and the top-level position are not the issue. The compiler points at log in its diagnostic.

A team replaces a ref counter with a stdlib Atomic.t, and the runtime race is genuinely gone. Porting the code to OxCaml, they swap it again, for Portable.Atomic. What does the second swap buy?

Why: Both atomics synchronise at runtime; the runtime behaviour is the same. The difference is what the type checker can see. Portable.Atomic is the modes-aware atomic: its signatures carry the annotations that let it cross the contention and portability axes, so portable closures can capture it and Domain.Safe.spawn accepts the result. The stdlib Atomic predates the mode system; nothing in its signatures tells the checker its operations synchronise.

Show reference solution

Q1 is rejected: the closure captures a mutable Buffer.t and writes to it inside the body. A portable closure's captures are treated as contended, and a contended value's mutable parts cannot be touched.

(* Q1: rejected. Run it. *) let (f @ portable) = let log = Buffer.create 16 in fun x -> Buffer.add_string log (string_of_int x ^ "; "); x + 1

Q2: the runtime behaviour of the two atomics is the same; the swap buys the annotations. Portable.Atomic's signatures tell the checker its operations synchronise, so it crosses both the contention and portability axes; the stdlib Atomic predates modes and tells the checker nothing.

Common pitfalls

Pitfall 1: "Atomics make my closure portable." The runtime race is fixed by any atomic. What the checker accepts depends on the annotations it can see: the stdlib Atomic predates modes and carries none, while Portable.Atomic declares that its operations synchronise. In OxCaml code, use Portable.Atomic.

Pitfall 2: "Portability bans captures." It does not ban them; it disciplines them. A portable closure's captures are treated as contended, so any use that needs more (a write, a read of a mutable field, a call that demands an uncontended argument) is rejected. Immutable captures sail through, because there is nothing mutable to race on.

Pitfall 3: "Portability and contention are the same axis." They are not. Contention (the previous lecture) tracks how a value may be accessed once shared; portability tracks whether a value may cross a domain boundary at all. The two-step argument needs both: portability guards the door, contention disarms what walks through it.

Pitfall 4: "I can mutate a parameter inside a portable closure." Yes; portability constrains captures, not parameters. A @ portable function can take an @ uncontended parameter and mutate it freely: parameters are supplied fresh at each call.

What portability buys you

What's next

The module closes with the tutorial: a bracketed resource-management API built from the resource axes, attacked three ways, and compared against a production OxCaml signature that uses every axis the module taught.

What's next

Reading

Sources

The gensym example, the portability capture rule and its test_portable / test_nonportable demos, the two-step data-race argument, the captures-versus-parameters example, and the Portable.Atomic "first working program" are adapted from the CS6868 OxCaml handout and slides, Part 2 (the instructor's own teaching material, freely reusable). The framing as a sequel to the contention lecture is original to this course. See LICENSES.md at the repository root for the full source posture.