Benchmarking a Binary Search Optimization, from 1991 to 2024

Nov 8, 2024

While reading Chris Okasaki’s Purely Functional Data Structures I became aware of an interesting optimization for binary search proposed by Arne Andersson in his 1991 paper, A Note on Searching in a Binary Search Tree. This blog post is an investigation of this optimization, including some updated benchmarks in Go and OCaml.

Binary Search Algorithms

As a reminder, a binary search tree can be thought of as a data structure consisting of empty leaves and non-empty nodes; each node contains a value and pointers to left and right subtrees. A binary search tree must satisfy the BST Invariant: For any node n, every node in the left subtree of n has a value less than n’s value, and every node in the right subtree of n has a value greater than n’s value. This naturally suggests the following algorithm for binary search, implemented in Go:

func BSearch(t *Tree, n int) bool {
	for t != nil {
		if n < t.v {
			t = t.l
		} else if n > t.v {
			t = t.r
		} else {
			return true
		}
	}
	return false
}

This algorithm has a worst-case time complexity of O(n), in an unbalanced tree whose every node we need to traverse. In general, the best way to optimize an algorithm is to improve its big-O complexity: In the case of a binary tree, this would often take the form of implementing a self-balancing binary tree, like a red-black tree (hopefully to be discussed in a later blog post). However, even the possibility for constant improvements should not be ignored when looking for optimizations, particularly on smaller data sets where constant factors may be more significant..

Arne Andersson points out that in the worst case, this binary search algorithm has to perform two comparisons on each node: First a less-than, and then a greater-than. Is it possible to perform only one? Some languages have a builtin three-way comparison operator, but many do not. Even without a three-way comparison operator, it’s possible to search a binary tree with only one comparison in the loop. This is achieved by keeping track of a candidate match, and then performing a second comparison only once at the very end. Here’s an example implementation in Go:

func BSearchOneComp(t *Tree, n int) bool {
	var candidate *Tree
	for t != nil {
		if n < t.v {
			t = t.l
		} else {
			candidate = t
			t = t.r
		}
	}
	if candidate != nil && candidate.v == n {
		return true
	}
	return false
}

Suppose we are searching in a tree for a value n which is somewhere in the tree, and suppose we reach the node which contains n. The candidate will now be set to n. Given the BST Invariant, we can now only make left turns, which do not update the candidate. The correct candidate is thus saved until the loop terminates.

The advantage of this potential optimization is that fewer numerical comparisons need to be made: In the worst case, n+1 comparisons, rather than 2n comparisons. The cost of this potential optimization is that if a match is near the root of the tree, in the worst case we will still need to traverse the whole tree before confirming that the candidate is a match. Benchmarking is therefore required.

Andersson’s 1991 Benchmarks

Here I reproduce the 1991 results from Andersson:

Andersson 1991 Benchmarks

That’s a blast from the past! (Let’s not get into reading the Modula-2 sample code to even understand the algorithms.) These are some pretty significant results, on the order of a 30% or more speedup in most cases. As Andersson notes:

[E]ven though the three-way comparison may exist at the machine-level this operation is not available in most high-level programming languages. […] [S]o far the author has never observed a compiler which actually translates these two comparisons in to one three-way comparison. (1125)

Thus implementing the one-comparison algorithm in code could lead to significant improvements.

Two Case Studies in 2024: Go and OCaml

However, these results are ancient. Compilers may have improved, and languages may have improved. Indeed, some languages do implement a three-way comparison operator (C++20), and as far as I know, x86 does not at the assembly level. Obviously there’s a lot to potentially study here, and I hope to expand this discussion in future posts. For now, I want to focus on two case studies: Go (1.23) and OCaml (5.2.0).

All the code and test results for these studies is available on my Github.

Go

It is well-known that Go has extensive support for benchmarking, which makes our job easier. Since computers are a little faster, I performed tests on two sizes of array. Note that in these results, a lower number is better.

Randomly inserted tree of 5000 integer elements, test performed five times, and results averaged:

OneComp: 94.77 ns/op
TwoComp: 81.33 ns/op

The one comparison algorithm is 16% slower! How about with larger trees of 100000 elements, again with the test performed five times and the results averaged?

OneComp: 197.92 ns/op
TwoComp: 178.70 ns/op

The one comparison algorithm in Go is still 10% slower!

OCaml

What about OCaml? OCaml is how I initially got into thinking about some of these algorithms, and it provides the opportunity to test recursive implementations of these two binary search algorithms. Unfortunately, there appear to be significantly fewer resources for benchmarking in OCaml. I ended up using ocaml-benchmark, which hasn’t been updated in several years. This library has a few issues. FIrst, it seems that the tabulation support for prettying printing the results does not work on the newest versions of OCaml. Second, it requires the user to make an estimation of the appropriate sample size for benchmarking.

For OCaml, I did 10 million iterations with trees of 5000 and 100000 elements. The OCaml benchmark library reports results differently from go: Number of iterations per second. I converted these to ns/op to make them comparable to the Go benchmarks.

Averaged results from five tests with a 5000-element, randomly inserted binary tree:

onecomp: 4893266.53/s = 204.36 ns/op
twocomp: 4436230.84/s = 225.42 ns/op

Averaged results from five tests with a 100000-element, randomly inserted binary tree:

onecomp: 3247907.21/s = 307.89 ns/op
twocomp: 2949312.62/s = 339.06 ns/op

With OCaml, Andersson’s algorithm is approximately 10% faster, with no significant difference between the trees of two different sizes.

Of course, these benchmarks also show that OCaml is significantly slower than Go on randomly inserted integer binary trees of the same size, although the gap is narrower with the larger trees.

The Verdict, and What’s Next

Obviously, the answer is: It depends! Optimization can only be performed properly based on proper benchmarks for your own language and workload. It seemed unlikely that these benchmarking results from 1991, still cited by Okasaki in 1998, would be totally robust. The benchmarking differences I found for Go and OCaml were signficantly smaller than those found by Andersson. And it’s worth recognizing that the algorithm is not an improvement in Go, and while it is an improvement in OCaml, it’s only a small improvement relative to the comparative slowness of the language.

There are two directions to pursue next: First, testing with balanced binary trees, which I intend to get to get to this weekend. Second, it would be interesting to do a comparison with other languages, particularly one like C++20 which has a three-way comparison operator, although I probably won’t get to this.