In today’s post, we will discuss the design and implementation of a small headers only library, whose aim is to complete the STL random header with combinators to generate arbitrary random data.
Throughout this post, we will first describe the motivations behind this library, and the its high level goals. These goals will imply design choices that we will discuss as well. We will then go deep into the implementation of the main features, before concluding and discussing potential improvements.
The second goal of this post is to show how thinking in terms of concepts such as Functors and Applicative can help us find powerful design constructs to base ourselves upon when building an API.
Motivation
The STL random header provides us with random number engines and random distributions. The random engines produce randomness, and the distributions consume this randomness to create random data: instances of booleans, integers and floating point number, all following some probabilistic properties.
The decoupled design of the STL random header offers a strong basis to build upon, for real applications requiring us to generate instances of more complex data types.
Indeed, in general, we have the need to generate random arbitrary sets of data, from the simplest tuples to more complex data types. For instance:
- Random element inside a collection of possible values
- Random tuples (like a 3D coordinate)
- Random strings, or other STL containers
- Random custom data types (like RGB colours)
- Random sum types, such as optional or variant
The goal of the tiny_rand library we will build today is to satisfy these needs. We will build upon the STL and complete it with the ability to randomly generate any kind of data we might want.
High level goals
Before discussing the design of tiny_rand library, it is useful to state and describe its high level goals. And these are:
- To build on top of the STL, making sure the integration is easy
- To be as light as possible in terms of dependencies (headers only)
- Te be complete, making sure we can generate any kind of data we want
- To be concise and small, with a small set of functions covering our needs
These high level goals will influence our high level design choices. This is the subject of our next section.
High level design choices
This section will describe the design choices we will follow to implement the tiny_rand library. These main lines behind choices are connected to our high level goals, although there are other possible variations on it.
Mersenne Twister
In this talk rand() Considered Harmful, Stephan T. Lavavej encourages to use the std::mt19937 as our default pseudo random generator. To keep our API simple, we will select this random engine specifically, and build on top of it.
We could choose to enlarge the API to support the other pseudo random generators. This can always be done later: we will keep it simple for now.
Essence of generators
With the std::mt19937 as our source of randomness, our goal is build more distribution-like generators. These generators will extend the concept of distribution and allow us to generate instance of arbitrarily complex type.
Looking at the std:: uniform_int_distribution, we see that it defines an operator() that takes a random engine as parameter. Simplifying things a little, we can see it as a function from a source of randomness (like std::mt19937) to a value of type int.
We can see there the essence of a random generator in the design of the STL: a callable object from a source of randomness to a type T the generator will generate instances of. We will follow this design of generators as modelled by functions.
Composition, no inheritance
OOP often drives us toward the use of objects, and quite easily toward using complex design patterns. There might be several design patterns that might seduce us here, like the Template Method.
Here is an example of usage of Template Method to handle the generation of random generators. It delegates to the generate_element the responsibility to generate the values inside the vector.
*-------------------* | IGenerator[T] | *-------------------* (Interface for generators) | + operator(): T | *-------------------* ^ | | *----------------------------* | VectorGenerator[T] | ( TEMPLATE METHOD ) *----------------------------* (Abstract implementation) | + operator(): vector[T] | (and generate_element as) | - generate_element(): T | (the customisation point) *----------------------------*
But this design has an important flaw: it is not composable. We cannot easily handle vector of vector of list of integers. Each layer would bring new generate_element sub-functions, and it becomes a real mess.
Because of this, we will stay away from the Template Method and rely instead on direct injection (*). The vector generator will take as parameter a generator to create its values.
Our design will rely on defining higher order generators, combinators, to combine other generators, and return new generators. This approach automatically composes and will allow us to generate any data we want, with only a few combinators.
(*) In general, I would recommend to stay away from the Template Method design pattern. Because it relies on inheritance, it couples things much more than composing Strategy. Overall, I think this is more often an anti-pattern than a pattern to follow.
Headers only, no dependencies
In order to make the library as light as possible, tiny_rand will be built as a headers only library. To avoid participating to the dependency hell syndrome, it will not be allowed to depend on any third parties library, including Boost.
Primitives
The STL offers generators for booleans, integers and double, but it does not offer ways to generate characters. As we will see, satisfying this need will be a bit more tricky that just adding one distribution.
Character generator
It seems the STL forgot about char generation. We will start by correcting this grave injustice with our char_gen generator:
inline auto char_gen() | |
{ | |
return [=](std::mt19937& bit_gen) -> char | |
{ | |
std::uniform_int_distribution<int> distribution{ | |
std::numeric_limits<char>::min(), std::numeric_limits<char>::max()}; | |
return distribution(bit_gen); | |
}; | |
} |
This generator will generate characters that cannot be properly printed on the screen. It points to the need to generate string with a restricted set of characters.
Reduced character set
We could reduce the range of allowed character by specifying a range, and deal with it the same way the STL distribution do it for integers. But characters are not integers: there are much fewer, and they have a different semantic.
We will instead introduce a more general way to pick an element in a finite (and small) set of values, the choice generator:
template<typename Container> | |
auto choice_gen(Container const& values) | |
{ | |
return [=](std::mt19937& bit_gen) -> typename Container::value_type | |
{ | |
std::uniform_int_distribution<int> distribution(0, values.size() - 1); | |
return values[distribution(bit_gen)]; | |
}; | |
} |
From this choice_gen generator, we can build generators for alpha-numeric characters, or any other kind of reduce character set we want. We use it below to generate letters:
inline auto letter_gen() | |
{ | |
static const std::string ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"; | |
return choice_gen(ALPHABET); | |
} |
This generator is however much more general: you can also use it to generate values inside an enumeration which is a useful primitive to handle as well.
Playful character generation
We can be more imaginative than just generating letters. We could for instance write a generator that creates valid identifiers for C++ functions or classes:
inline auto cpp_identifier_gen(int max_size) | |
{ | |
using namespace tiny_rand; | |
static const std::string HEAD_CHARS = "_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"; | |
static const auto head_char_gen = choice_gen(HEAD_CHARS); | |
static const auto rest_gen = string_gen(max_size - 1, choice_gen(HEAD_CHARS + "1234567890")); | |
return [=](std::mt19937& bit_gen) -> std::string | |
{ | |
return std::string {1, head_char_gen(bit_gen)} + rest_gen(bit_gen); | |
}; | |
} |
This code makes use of the string_gen, which allows to generate strings with a maximum length, and is parameterised by a character generator. We will see how to implement it later in this post.
This cpp_identifier_gen will produce perfectly valid C++ identifiers. Feel free to use it, if ever you lack inspiration in finding good-hard-to-decipher function names…
sbYg8Nq_1kIf | |
gKvRV9 | |
vtv | |
nc0TZJskfmPKBj3TzikzwNPjbsoKhg | |
tIOWNUdojwgpujCcr7wli1r9eY | |
yYWpELqExw65UeeGAdmwQP4 |
Functors and Applicatives
This is not going to be a lecture on Category Theory. We will however use the Math concepts of Functor and Applicative to get a first shot at a good design.
One thing that Haskell practitioners quickly discover is that Functors and Applicative concepts are very general.
- They occur quite often and quite naturally.
- They compose well with other pieces of software.
- They almost always lead to good design.
In this section, we will apply these concepts very practically. We will see how they connect to our random generators, and how they will help our design.
I see a functor in your randomness
Functions are functors. A function from R (for random) to A can be transformed into a function from R to B, if we have a function from A to B. We only need to compose these functions together. This is the definition of a Functor.
Since we design our random generators as callable from std::mt19937 to a value of type T (a kind of function), we can apply this useful mathematical construct here. We call it transform_gen and implement it as function composition:
template<typename Mapper, typename Generator> | |
auto transform_gen(Mapper map, Generator generator) | |
{ | |
return [=](std::mt19937& bit_gen) | |
{ | |
return map(generator(bit_gen)); | |
}; | |
} |
This powerful construct allows us to create new random generators by taking existing generators and composing them with a transformation function. Using it, we can for instance generate random square numbers between 0 and 100:
// Square number generator | |
auto square_gen = | |
transform_gen([](int i) { return i * i; }, int_gen(0, 10)); | |
// Testing in std::cout | |
for (int i = 0; i < 10; ++i) | |
std::cout << square_gen(bit_gen) << ","; | |
std::cout << '\n'; | |
// Outputs: | |
// 64,36,9,16,81,0,100,64,64,25, |
Please note we are not limited to such scalar-to-scalar mappings. We could for instance generate a range [0, N) from a random generator N the same way.
The road to Applicative
When we discover a Functor in our code, a great habit is to immediately look for the next more powerful pattern: Applicative Functors.
One way to see Applicative is as a generalisation of Functors on multiple arguments. Our Functor was able to transform a single generator into one generator. Our Applicative Functor will be able to combine several generators into one generator.
It might look obscure if you never tried Haskell (you should, it is glorious), but it is very easy to implement. We call this function apply_gen:
template<typename Finalizer, typename... Generators> | |
auto apply_gen(Finalizer finalizer, Generators... generators) | |
{ | |
return [=](std::mt19937& bit_gen) | |
{ | |
return finalizer(generators(bit_gen)...); | |
}; | |
} |
- The N generators are combined into one single generator
- The resulting generator triggers the N generators and feed finaliser
- The finalizer is a function that takes the output of the N generators
Applicative, the key to custom types
Because of its ability to compose N generators and feed them to an arbitrary function, our Applicative Functor allows to create generators for custom data structure, by using a constructor or factory function as finalizer.
We can demonstrate this with a simple example. The following piece of code shows out to build a generator for rgb_color, in just 3 lines of code:
struct rgb_color | |
{ | |
int m_red; | |
int m_green; | |
int m_blue; | |
}; | |
auto color_gen = int_gen(0, 255); | |
auto rgb_color_gen = apply_gen(to_object<rgb_color>(), | |
color_gen, color_gen, color_gen); |
We used to_object as a helper function (also provided in tiny_rand) to create a function that calls the constructor of a struct, with a variadic number of arguments provided as parameters:
template<typename Output> | |
struct to_object | |
{ | |
template<typename... Args> | |
Output operator()(Args&& ... args) const | |
{ | |
return Output { std::forward<Args&&>(args)...}; | |
}; | |
}; |
We can give it a try to convince ourselves that it works:
{ red: 84, green: 202, blue: 150 } | |
{ red: 252, green: 70, blue: 122 } | |
{ red: 55, green: 103, blue: 164 } | |
{ red: 209, green: 234, blue: 12 } | |
{ red: 22, green: 125, blue: 12 } |
Tuples: special case of Applicative
One specific use case of Applicative Functors is the generation of tuples of arbitrary types. We could do it by using std::make_tuple as finalizer of apply_gen.
Because these specific generators are likely to used frequently, we can write them some specialized implementation to help our client:
template<typename LhsGenerator, typename RhsGenerator> | |
auto pair_gen(LhsGenerator lhs_gen, RhsGenerator rhs_gen) | |
{ | |
return [=](std::mt19937& bit_gen) | |
{ | |
return std::make_pair(lhs_gen(bit_gen), rhs_gen(bit_gen)); | |
}; | |
} | |
template<typename... Generators> | |
auto tuple_gen(Generators... generators) | |
{ | |
return [=](std::mt19937& bit_gen) | |
{ | |
return std::make_tuple(generators(bit_gen)...); | |
}; | |
} |
From this, we can create a generator of 3D coordinates, each coordinate component between -10 and 10, in just a couple lines of code:
auto coord_1d_gen = double_gen(-10, 10); | |
auto coord_3d_gen = tuple_gen(coord_1d_gen, coord_1d_gen, coord_1d_gen); |
Containers random generators
In the last section, we implemented the first building blocks of our library. The two functions we implemented already lead us pretty far. The power of Functors and Applicative allows us to create custom data structures from other generators.
In the previous examples, we started to notice the need for random generators of containers. For instance, we used string_gen to generate random valid C++ identifiers. In this section, we will implement these generators.
Pulling the Strings
We start with the random generation of std::string. To remain flexible regarding the set of characters that can appear in the string, our string_gen combinator takes as input a random generator of character.
We want the size of the string to be random as well. To do so, we define two overloads that strike a balance between flexibility and ease of use.
The first overload gives us full flexibility: we take a SizeGenerator as input to generate the length of the generated string. This allows to choose any distribution from the STL and not necessarily stick to the uniform one.
But in the most general case, the user will only want to specify a maximum length for the generated string. So we offer an overload for this case as well.
template<typename SizeGenerator, typename CharGenerator> | |
auto string_gen(SizeGenerator size_gen, CharGenerator char_generator) | |
{ | |
return [=](std::mt19937& bit_gen) -> std::string | |
{ | |
std::string out(size_gen(bit_gen), ' '); | |
details::repeat_n_gen(out.begin(), out.size(), char_generator, bit_gen); | |
return out; | |
}; | |
} | |
template<typename CharGenerator> | |
auto string_gen(int max_size, CharGenerator char_generator) | |
{ | |
return string_gen(int_gen(0, max_size), char_generator); | |
} |
This function makes use of the repeat_n_gen helper functions, that allows to fill a container with repetitive calls to a random generator:
template<typename ValueGenerator, typename OutputIterator> | |
void repeat_n_gen(OutputIterator out, int count, | |
ValueGenerator value_gen, | |
std::mt19937& bit_gen) | |
{ | |
std::generate_n(out, count, | |
[&] { return value_gen(bit_gen); }); | |
} |
Vector, List, Deque…
All sequential containers can be implemented almost the same way we did for the string random generator. Here is the implementation for the std::vector:
template<typename SizeGenerator, typename ValueGenerator> | |
auto vector_gen(SizeGenerator size_gen, ValueGenerator value_gen) | |
{ | |
return [=](std::mt19937& bit_gen) | |
{ | |
int count = size_gen(bit_gen); | |
std::vector<decltype(value_gen(bit_gen))> out; | |
out.reserve(count); | |
details::repeat_n_gen(std::back_inserter(out), count, value_gen, bit_gen); | |
return out; | |
}; | |
} | |
template<typename ValueGenerator> | |
auto vector_gen(int max_size, ValueGenerator value_gen) | |
{ | |
return vector_gen(int_gen(0, max_size), value_gen); | |
} |
The main difference between the containers is the use of the reserve that is not available for all containers.
Associative containers
Associative containers such as map or set are a bit more tricky to design. In particular, the size of the container is a bit problematic.
Should we ask for a size and try to insert more keys until we reach that size? This might be problematic if the required size is bigger than the number of possible inhabitants for the keys. Doing so might create an infinite loop.
For this reason, these generators will not ask for a given length but for a given number of rolls of the random generator for keys.
Here is the implementation for the std::unordered_map:
template<class RollCountGenerator, class KeyGenerator, class ValueGenerator> | |
auto unordered_map_gen(RollCountGenerator roll_count_gen, | |
KeyGenerator key_gen, | |
ValueGenerator value_gen) | |
{ | |
return [=](std::mt19937& bit_gen) | |
{ | |
int count = roll_count_gen(bit_gen); | |
std::unordered_map<decltype(key_gen(bit_gen)) | |
,decltype(value_gen(bit_gen))> out; | |
out.reserve(count); | |
details::repeat_n_gen(std::inserter(out, out.end()), | |
count, pair_gen(key_gen, value_gen), bit_gen); | |
return out; | |
}; | |
} | |
template<typename KeyGenerator, typename ValueGenerator> | |
auto unordered_map_gen(int max_rolls, | |
KeyGenerator key_gen, | |
ValueGenerator value_gen) | |
{ | |
return unordered_map_gen(int_gen(0, max_rolls), key_gen, value_gen); | |
} |
Note that reserve that cannot be used for std::map. This is again the main difference between the associative containers.
Enjoying container generators
Using these combinators, we can create a generator of unordered map from strings (like player names) to 3D coordinates (the location of these players on a game map) in only 4 lines of code:
auto words_gen = string_gen(10, letter_gen()); | |
auto coord_1d_gen = double_gen(-10, 10); | |
auto coord_3d_gen = tuple_gen(coord_1d_gen, coord_1d_gen, coord_1d_gen); | |
auto string_to_coord_gen = unordered_map_gen(10, words_gen, coord_3d_gen); |
We can generate a sample map for fun, and try to iterate it to convince ourselves that it works fine. Here is a possible output:
[ pJAuOuiyj, [-2.1587, -7.48616, -3.2928] ] | |
[ IwExd, [-8.16682, 7.071, -4.79604] ] | |
[ BfHXZgzy, [-8.47121, 8.50638, -3.11689] ] |
Sum types, our last combinator
The previous sections described a set of random generator combinators that allow to generate custom data types and containers. There is however still one missing piece: the support for sum types.
Sum types
Sum types are very useful constructs in software development and arise in C++ through different forms.
The most commonly known form is through polymorphism (*): if B and C inherits from A, then ideally, we should be able to combine random generators of B and C to create a random generator of A.
Variants are the second form of sum types known from C++ programmers. For instance, we can define A as a boost::variant of B and C. Again, we should ideally be able to combine random generators of B and C to create a random generator of A.
(*) This is not exactly true. Polymorphism is not the same thing as a sum type. But it can be used (and was used commonly before boost::variant) to implement it, in combination with the Visitor design pattern
Implementation
We want to support different forms of sum types. We also had for goal to avoid introducing dependencies to Boost. So we have to find a way to generate a variant without having to depend on it.
The solution is to abstract these concerns away under the notion of a finalizer function, as we did for the Applicative. This function will be called on the production of one of the generator passed as input of our one_of_gen random generator combinator.
The implementation makes use of type erasure through the use of std::function (there are probably ways around this) to store all the generators inside a vector. It then randomly selects one to generate a value and gives it to the finalizer:
template<typename Finalizer, typename Generator, typename... Generators> | |
auto one_of_gen(Finalizer finalizer, Generator head, Generators... tail) | |
{ | |
using Out = decltype(finalizer(head(std::declval<std::mt19937&>()))); | |
std::vector<std::function<Out(std::mt19937&)>> gens{ | |
transform_gen(finalizer, head), | |
transform_gen(finalizer, tail)... | |
}; | |
return [=](std::mt19937& bit_gen) | |
{ | |
std::uniform_int_distribution<int> distribution(0, gens.size() - 1); | |
return gens[distribution(bit_gen)](bit_gen); | |
}; | |
} |
Demo time!
Using the finalizer appropriately, we now have the support for the random generator of sum types by combining generators of each of its parts.
But the pattern is more general: the finalizer can be used to create an std::optional, a boost::variant, or anything else. For instance, weird_gen is a weird way to implement a generator of integer values between -10 and 10:
- The integer generator will produce values from -10 to 0
- The string generator will produces string with maximum length 10
- The finalizer maps each of the string to their length
struct finalizer | |
{ | |
int operator() (int i) const { return i; } | |
int operator() (std::string const& s) const { return s.size(); } | |
}; | |
auto weird_gen = one_of_gen(finalizer{}, | |
int_gen(-10, 0), | |
string_gen(10, letter_gen())); |
For sure, nobody sane would implement such a thing, but that should gives you a taste of what it is possible to with one_of_gen. For instance, we could generate random events and send it to an event handler.
Enjoying our hard work
We are done. The resulting library is available as tiny_rand on my GitHub. The samples directory shows some more examples of use cases.
We will conclude this post with one such more elaborate example. Let us imagine we have a game object, that contains an integer for the current round number, and a map from player names to their respective 3D position:
using PlayerName = std::string; | |
using Coordinate = std::tuple<double, double, double>; | |
struct Game | |
{ | |
int m_current_round; | |
std::map<PlayerName, Coordinate> m_players_location; | |
}; |
Here is how we can generate a random instance of Game, using the combinators we presented throughout this post:
auto round_gen() | |
{ | |
return int_gen(0, std::numeric_limits<int>::max()); | |
} | |
auto player_name_gen() | |
{ | |
return string_gen( | |
int_gen(1, 30), //Player name cannot be empty | |
alphanum_gen() //Restriction on the character set | |
); | |
} | |
auto coord_3d_gen(double map_size) | |
{ | |
auto coord_1d_gen = double_gen(0, map_size); | |
return tuple_gen(coord_1d_gen, coord_1d_gen, coord_1d_gen); | |
} | |
auto game_gen(int max_player, double map_size) | |
{ | |
return apply_gen( | |
to_object<Game>(), //Combine into a game | |
round_gen(), | |
sorted_map_gen(max_player, | |
player_name_gen(), | |
coord_3d_gen(map_size)) | |
); | |
} |
Conclusion, and what’s next
Throughout this post, we built a small random generator library to complete the STL random header with additional features to generate any kind of data. This library is available on my GitHub. Any feedback (suggestions or criticisms) are welcome.
Small core, Big reach
We built the API based on a small set of core ideas, which proved pretty powerful and sufficient to go all the way:
- A random generator is a function from std::mt19937 to a value
- Choosing to combine generators by direct injection to compose them
- Embracing the power of Functor, Applicative and Sum types
It demonstrates that surprisingly basic constructs can take us a long way, and that Haskell has some pretty useful design techniques ready for us to use.
Further improvements
There are some improvements that could be done on the library and its implementation, and on which I plan to work, among which stands:
- Complete the choice_gen with weighted_choice_gen to add weights
- Improve random char generation: it only works for small character sets
- The use of std::function, which could be replaced by a less overhead construct
- An overall work on performance and in particular the number of copies
- The generators created cannot easily be inspected (at compile time)
Further further improvements
After having read the great post An Introduction to Reflection in C++, I think it would be worth to find a way to automatically create generators from introspection.
Especially for enumerations, where we end up saying things twice:
enum class db_relation | |
{ | |
one_to_one, | |
one_to_many, | |
many_to_many | |
}; | |
// Should be automated | |
std::vector<db_relation> all_relations = { | |
db_relation::one_to_one, | |
db_relation::one_to_many, | |
db_relation::many_to_many | |
}; | |
// Creating the generator, picking among the all_relations vector | |
auto relation_generator = tiny_rand::choice_gen(all_relations); |
In some cases, creating the generators by introspection would not be desirable: as shown through the examples of this post, we often need additional parameters to tune the random generation.
But it would probably cover quite a lot of use cases nevertheless and could be implemented as a different library.
Closing words
Any feedback regarding the features, the design choices or the implementation are welcome. Feel free to reach me on Twitter.