filter: keep what passes the predicate
filter is the second of the three canonical higher-order list
functions. Where map transforms every element
and keeps them all, filter keeps each element exactly as it was,
but drops the ones that fail a test. Its type signature, ('a -> bool) -> 'a list -> 'a list, encodes that promise: the function
argument is a predicate (it returns a bool), and the result list
has the same element type as the input. No transformation; just
selection.
The pair map + filter covers a remarkable amount of everyday
list manipulation. The rest of this lecture works through the
definition, the common variations (filter_map, partition), and
the points where you should pause and reach for filter rather than
something else.
Writing filter from scratch
Just as we did with map,
let us start from two concrete functions that share a shape.
Both walk a list. Both keep an element if some condition is true and
drop it otherwise. The condition is the only thing that differs:
h mod 2 = 0 vs h > 0. Factor the condition out as a parameter
p (for predicate):
Now evens and positives collapse to one-liners:
That is the entire trick. The standard library calls this function
List.filter, and the implementation is essentially what we wrote.
The library version is also stack-safe on long lists, which we
will come back to in a moment.
Type and what it tells you
The signature of filter:
val filter : ('a -> bool) -> 'a list -> 'a list
The element type appears in three places, and it is the same 'a
in each: the predicate takes an 'a, the input is a list of 'a,
the output is also a list of 'a. Filtering cannot change the type
of the elements. If you find yourself wanting to "filter and
transform," you want filter_map,
which we will see below.
The output is always a sublist of the input, in the same order. The relative order of elements is preserved: filter never reshuffles.
Examples
The argument order matters: predicate first, list second. This is
consistent with List.map (function first, list second) and with
List.fold_left
(function first, accumulator second, list third). The convention is
that "the part that varies between calls" (the function argument)
comes first; the list comes last. This lets you partially apply the
function argument and get back a useful specialised function, like
evens = filter (...) above; the rule of thumb is "data goes last."
Combining map and filter
A great deal of everyday list processing is "map, then filter" or "filter, then map":
big_squares returns [16; 25]: square each of 1..5 to get
[1; 4; 9; 16; 25], then keep those above 10 to get [16; 25].
The pipeline operator |> is just function application written in
the other direction: x |> f is exactly f x. We will dedicate
Lecture 5 to it. For now, notice that the
pipeline reads top to bottom as a clear sequence of steps: start
with the list, square, filter. The alternative is nested calls:
Same answer, but you have to read inside-out: find xs at the
deepest level, then map, then filter. With three or more steps
the inside-out version becomes unreadable; the pipeline form keeps
its clarity.
filter preserves relative order
A property worth stating explicitly:
Elements that passed (> 3) appear in the
same order they did in the input. This matters more than you might
initially think: if the input was a sorted list, the output is still
sorted; if it was a chronologically-ordered log, the output is still
in chronological order. You do not have to re-sort after filtering.
filter_map: filter and transform in one pass
Often you want to keep and transform each element. A common example: parse a list of strings into a list of integers, dropping the strings that did not parse. With the tools we have so far:
This works, but it walks the list three times and uses
Option.get, which raises an exception if it ever sees a None.
The code knows the Nones are gone, but the type system does not.
The standard library function List.filter_map packages "keep some,
transform the kept ones" into a single primitive. The function it
takes returns an 'a option: None means "drop this element,"
Some y means "keep y in the output."
The result is [42; 13; 0]: the three strings that parsed, in
order, as raw integers. The strings "frog" and " " returned
None from int_of_string_opt and were dropped.
filter_map is one of those "where was this all my life" functions
once you discover it. Any pipeline of "parse, drop the failures,
move on" benefits from it. Its type:
val filter_map : ('a -> 'b option) -> 'a list -> 'b list
The element type can change ('a to 'b), unlike plain filter.
And the result length can shrink, unlike plain map. filter_map
is the most general of the three "walk a list and transform"
operations: map is filter_map where the function always returns
Some; filter p is filter_map (fun x -> if p x then Some x else None).
partition: keep both halves
Sometimes you want to keep the elements that fail the predicate
as well. You could call filter p and filter (fun x -> not (p x)) separately, but that walks the list twice. The standard library
provides List.partition to do both in one pass:
The result is a pair of lists: passed = [85; 73; 95], failed = [42; 30; 58]. Both lists preserve the original order. The type:
val partition : ('a -> bool) -> 'a list -> 'a list * 'a list
Use partition whenever you would otherwise call filter twice on
the same list with complementary predicates: it is a single pass and
makes the intent obvious to a reader.
A real-world filter pipeline
Filter and map together let you express SQL-style "select fields where condition holds" queries very compactly. Suppose we have a list of books:
The result is ["OCaml"; "Rust"]: those are the books from 2020 or
later with more than 100 pages.
The same query in SQL would be `SELECT title FROM library WHERE year
= 2020 AND pages > 100
. The OCaml version is one filter plus one map. This filter-then-map pattern (sometimes called *select-where* in database lingo, or *project-select* in relational algebra) is so common that in some communities people call OCaml programming "functional querying" once they discover it. Withfilter_map` the same thing collapses to one call:
One pass instead of two; same result.
Tail recursion and filter
We saw with map that the naive recursive implementation is not
tail-recursive. The same is true of filter:
In the "keep" branch, the cons h :: filter p t does work after the
recursive call. So filter sits on the stack the same way map does.
You can make it stack-safe by hand with exactly the
accumulator-and-reverse trick we showed for map:
The standard library does not need the trick: just as with
List.map, modern OCaml compiles List.filter with tail recursion
modulo cons, so the natural cons-onto-the-recursive-call definition
is stack-safe out of the box. For practical purposes, both versions
are fine on lists of any length you are likely to meet.
A quick check
What is the type of List.filter (fun x -> x > 0)?
int list -> intint list -> int listint -> int listint -> bool
Why: List.filter has type ('a -> bool) -> 'a list -> 'a list. Apply it to a predicate int -> bool, and you get back the
specialised type int list -> int list. Note the predicate is
(>) 0-like, and once supplied, only the list argument is left.
Which of these are equivalent to List.filter p xs?
List.filter_map (fun x -> if p x then Some x else None) xsList.map p xsfst (List.partition p xs)List.find p xs
Why: the first option turns the predicate into a "return Some
x if it passes, None otherwise" function and uses filter_map, so
yes. The third option uses partition and takes the "passed" half,
also equivalent. The second option (List.map p xs) returns a
bool list, not the filtered list. The fourth option (List.find)
returns the first matching element (or raises), not the whole
sublist.
A code challenge:
We used List.partition above; now build it yourself. Write
partition : ('a -> bool) -> 'a list -> 'a list * 'a list that
returns the elements satisfying the predicate and the elements
failing it, as a pair of lists, each preserving the input order.
Do not use List.partition or List.filter; recurse directly.
Show reference solution
Reference solution:
let rec partition p = function
| [] -> ([], [])
| h :: t ->
let (yes, no) = partition p t in
if p h then (h :: yes, no) else (yes, h :: no)
Partition the tail first, destructure the resulting pair with a
let pattern, then cons the head onto whichever half it belongs
to. Both halves come out in input order because each head is
prepended to an already-ordered tail result.
Activity
Show reference solution
List.mem is not magic
We leaned on List.mem to check membership. It is itself a small
recursive function, the same shape we have been writing all
along:
|| short-circuits: as soon as h = x is true, the rest of the
list is not traversed. Worst case (the element is missing or
last) is a full walk; that is the O(n) we counted toward the
overall O(n^2) of unique.
Dropping the order-preserving rule
The O(n^2) cost above buys us one specific property: the output
keeps the original first-appearance order. If we relax that
property and let the output be sorted instead, we can do much
better. The trick is to sort the input first (O(n log n)) and
then walk it once, keeping the first of every consecutive run:
List.sort compare is the stdlib's general-purpose sort
(O(n log n) worst case). After sorting, duplicates sit next to
each other, so the linear sweep go just skips runs of equal
values.
What's next
We have map (transform every element) and filter (keep a
sublist). Both build new lists. The next lecture,
fold, builds anything: a number, a string,
a record, another list. It is the most general of the three, and is
the one that subsumes the other two.
Reading
- Cornell CS3110, Filter: https://cs3110.github.io/textbook/chapters/hop/filter.html
- Real World OCaml, Lists and patterns: https://dev.realworldocaml.org/lists-and-patterns.html
Sources
This lecture's prose, worked examples, and quizzes are original to
this course. Materials referenced during preparation are listed in
the Reading section above; Cornell CS3110 and Real World OCaml
are CC BY-NC-ND-licensed and have not been derivatively reused.
See LICENSES.md
at the repository root for the full source posture.