The alias method, or transforming data to increase efficiency (Clojure).

Posted by

This post is dedicated to describing a wonderful and beautiful small algorithm that is not widely known. It allows to pick a random element among an enumerated distribution of elements, where each element is associated a weight.

This need is frequently encountered, for instance in property based testing libraries. In test.check, it correspond to the library function frequency which allows to select a random generator based on frequency of occurence (the weight).

We will first describe two classic methods to solve this problem before jumping to the Alias Method, a very efficient algorithm. We will describe how it works and give one possible implementation.

We will end this post by taking some distance and discuss the importance of data transformations to build efficient algorithms, and one reason that might explain why tend to fail to see these transformation so often in our everyday code.

 

Stating the problem


As input, we have a bunch of elements xs and each element is associated a weight. This weight is proportional to the probability of picking that element among xs. We call this kind of distribution an enumerated distribution.

Our algorithm should output a random generator such that:

  • Each time we call it, it outputs an element from our elements xs
  • The probability of taking x from xs is P(x) = Weight(x) / TotalWeight
  • The TotalWeight is the sum of Weight(x) for all x of the elements xs

We will name the function that implements this algorithm (and returns the desired generator) enumerated-distribution-gen.

 

Usage example


Let us take as input a map associating to the keyword :a the weight 1, to :b the weight 2 and to :c the weight 1. We expect our enumerated-distribution-gen function to output a random generator that will:

  • Output :a 25% of the time
  • Output :b 50% of the time
  • Output :c 25% of the time

Here is how it would translate in term of Clojure code:

We will now go through different implementation of this algorithm.

 

Linear naive algorithm


The most naive algorithm consist in computing the sum of the weights, which we will name total-weight and to randomly pick a number between 0 and this total-weight (excluded). We then linearly scan the collection to find the appropriate match.

For instance, if we pick as input {:a 1, :b 2, :c 1}, we compute its total-weight, yielding 4. Each random generation will randomly generate an double d between 0 and 4, and scan the collection to find the correct match:

  • If d is in [0, 1), we output :a
  • If d is in [1, 3), we output :b
  • If d is in [3, 4), we output :c

 

Implementation

Finding the correct match is rather simple: we subtract to d the weight of each element. The first element that makes d strictly negative is the one we select.

This function makes use of the sum-weights helper function that sums the weight of our input elements:

We can run our function a few time to see how it behaves:

 

Alternate implementations

Equivalently, we can precompute the cumulative sum of the weights, and select the first cumulative weight that is higher than the number d we generate. In terms of algorithm complexity, these implementations are equivalent to the one provided above:

  • O(N) pre-processing time (summing the weights)
  • O(N) processing time at each rand (scanning)

Two implementations (from Rich Hickey and Stuart Halloway) of this variant of the naive algorithm are shown in this stack overflow thread.

 

Binary search


The implementation based on cumulative weights gives us some insight on how we can improve our algorithm. Since cumulative weights form an increasing sequence of weights, the weights are sorted and so we can binary search into it.

Based, on binary search, we can improve our algorithm complexity to:

  • O(N) pre-processing time (summing the weights)
  • O(log N) processing time at each rand (binary search)

In case the random generator output by enumarated-distribution-gen is used a lot of times, this can yield some interesting performance boost. We will not delve further into this. The implementation is very similar to the previous one, and is left as an exercise for the reader.

 

Alias method


We will now cover the alias method, an algorithm that allows to randomly pick an element from an enumerated distribution in O(1), given a typical O(N log N) pre-processing time.

In this section, we will describe the algorithm, starting with an analogy with buckets of water, before moving to the implementation in the next section.

 

Let’s talk about buckets

Let us imagine a bucket of water of volume Vtotal, where Vtotal is the sum of the weights of the elements in the enumerated distribution. We fill this bucket such that each element x is associated with the volume Weight(x).

This bucket equivalent to our enumerated distribution. Our {:a 1, :b 2, :c 1} initial example can be viewed as as bucket of 4 litres capacity, filled with:

  • 1 litre of :a
  • 2 litres of :b
  • 1 litre of :c

In this view, picking one random element in our enumerated distribution is equivalent to randomly picking (with a uniform distribution) one random atom in the bucket.

 

Splitting in several buckets

Things start to be interesting when we split our single bucket of 4 litres into 2 buckets of 2 litres each, and fill them intelligently:

  • We can fill the first bucket with our 2 litres of :c
  • We can fill the second bucket with 1 litre of :a and 1 litre of :c

To randomly pick an element, we first randomly pick one bucket. The selected bucket is a new enumerated distribution, smaller than the initial one.

 

A good split

It is important to note that we should divide our initial bucket of volume total-weight into buckets of equal volumes. This makes the random selection of one of the K bucket a simple uniform integer roll between 0 and K-1. Had we not do that, we would end up with an additional enumerated distribution to select our bucket.

The trick is then to choose our number of bucket K and the way we fill each of these buckets in such a way that each of the sub-problems are made so simple than they can be decided by a simple random roll. The Alias method is such a way.

 

The Alias Method

The Alias Method gives us a way to split our probability volume into buckets, such that these buckets are filled with at most 2 elements of our initial enumerated distribution. This makes it possible to randomly pick an element in two random rolls:

  • One integer roll, to pick the bucket
  • One double roll, to pick an element from the bucket

We select our number of bucket K as being equal to the number of xs elements, denoted N, minus one. Each bucket has a volume Vbucket equal to Vtotal divided by N – 1.

We fill each bucket by picking an element with a weight W inferior to its volume Vbucket. This leaves Vremaining (VbucketW) as remaining volume to fill. We fill it with an element whose weight is higher than Vremaining (it exists by construction).

This process will completely drain one element at each iteration, and consume some of the volume of another element as well. We repeat this process until all the buckets are filled, at which point each of our elements are drained (by construction).

 

Example

Let us consider the following example: {:a 1, :b 4/3, :c 1, :d 2/3}:

  • The total volume Vtotal equal to 4
  • The number of bucket K equal to 3 (4 elements – 1)
  • Each bucket has a volume Vbucket equal to 4/3

We start with all buckets empty:

Elements = {:a 1, :b 4/3, :c 1, :d 2/3}
Bucket 1 = Empty
Bucket 2 = Empty
Bucket 3 = Empty

Step 1: We fill the first bucket with :a whose volume 1 is inferior to Vbucket (4/3). There is still 1/3 litres to fill in this bucket. We take them from element :b:

Elements = {:b 1, :c 1, :d 2/3}
Bucket 1 = {:a 1, :b 1/3}
Bucket 2 = Empty
Bucket 3 = Empty

Step 2: We fill the second bucket with :c whose volume 1 is inferior to Vbucket (4/3). We complete the remaining 1/3 litres by taking them from our element :b again:

Elements = {:b 2/3, :d 2/3}
Bucket 1 = {:a 1, :b 1/3}
Bucket 2 = {:c 1, :b 1/3}
Bucket 3 = Empty

Step 3: We fill the third bucket, whose volume is 4/3, with the remaining 2/3 litres of :b and :d. This completes our distribution in buckets:

Elements = Empty
Bucket 1 = {:a 1, :b 1/3}
Bucket 2 = {:c 1, :b 1/3}
Bucket 3 = {:b 2/3, :d 2/3}

We are now done. To randomly select an element, we pick a random bucket uniformly and then roll a double between 0 and 1. If the roll is higher than the ratio of the first element in the bucket, we pick the second element, else the first element.

For instance, we pick {:c 1, :b 1/3} by first rolling 1. Then we pick :b by rolling 0.9 which is higher than the ratio of :c in the bucket (3/4).

 

Implementing the Alias Method


Now that we know how the Alias Method works, we will implement it. Since the Alias Method stays voluntarily vague on the order in which we can pick our elements to fill the buckets, there are several possible implementations available for us.

 

Chosen implementation

We will use the nice semantics of sorted-set to help us. We will store each element and its associated weight as a key in a sorted set of pairs [weight value]. So the weight will act as primary attribute in the ordering: the elements will be sorted by increase weights.

As a consequence, by construction, the first element of the sorted set will have a weight inferior to the volume of a bucket, and the last element will have a weight superior to the remaining volume of the bucket.

At each iteration, we will therefore pick the first element of the set, drain it completely to fill the bucket as much as possible. We will then pick the last element of the set and drain enough of it to fill the remaining of the volume.

 

High level Clojure code

The algorithm translates into the following Clojure code, in which we first check if the input collection has enough element:

  • If we have just a collection of one element to pick from, we short circuit and return a function that constantly returns the same number
  • Otherwise, we construct our vector of buckets with enum-dist->buckets, and implement our algorithm based on two uniform random rolls

 

Clojure Code to create buckets

The construction of the buckets is based on the sorted-set collection of Clojure, and always picks the first and last element of the container to fill the current bucket:

Where fill-one-bucket is the function that implements the draining of the quantities to-verse into a bucket. It is the core of the algorithm:

We can test our fill-one-bucket function in the REPL and see how it implements one iteration of the Alias Method:

We can also test the enum-dist->buckets and check that it transforms our enumerated distribution into buckets correctly:

 

Conclusion: data transformation vs code tuning


In this post, we saw how to generate random number from an enumerated distribution containing N element, in constant time, given a pre-processing time of O(N log N).

 

Simple yet effective data transformations

Aside from the algorithm itself, the fascinating bit about the Alias Method is how simple it ends up being: it is about transforming pairs of value and probabilities, into tuples of two elements and a probability.

It shows that switching between data representation can unlock unsuspected performance improvements, and also show that it does not systematically require complex data structure to speed up an algorithm tremendously.

 

Failing to see these transformations

The Alias Method is pretty smart: it is not something we would naturally come up with. But it asks the question of how many data transformation we miss, that would improve our run-time efficiency tremendously, instead of focusing so much of our efforts on code tuning.

In my years doing professional development, I have seen plenty of simple yet missed opportunities of data transformation which would have improved the efficiency of the code by much. We do miss a lot of trivial ones, and it is not because they are hard.

We often simply do not look for them. Lots of what we learned in OOP books makes us think in terms of fixed entities, with one chosen representation, and very few incentives to actually copy data to transform it into another form (mapping it).

While having several ways to represent the same data will certainly comes at a cost if introduced gratuitously, this might be the key to unlock better performance in our software as well, and not stay stuck in tuning a sub-optimal algorithms.

 

Escaping the trap in our minds

Clojure (and Functional Programming in general) avoids us falling into the trap of the “one representation to rule them all” by making us love data transformation.

As shown in this post, we can leverage this love for data transformations to build smarter data structure for more efficient algorithm. And I will certainly work on doing it more often in the years to come.