This post is the second of the series of post focused on providing useful higher order functions that goes hand in hand with STL algorithms and ranges.
Last post was focused on higher order functions, lenses, that allows to zoom on the elements being scanned.
Today’s post is about exploring higher order functions that allows to combine several traversals into one single traversal, bringing better performance through composability. The best name I found for these functions are traversal combinators.
Motivation
The post of today will be based on a simple example: we have a vector of string as input, and we would like to know, for each string, the number of vowels it holds and the number of character it has.
Assuming the following functions already exist:
size_t get_length(std::string const& s) { return s.size(); } bool is_vowel(char c) { static const char vowels[] = "aeiou"; return end(vowels) != find(begin(vowels), end(vowels), c); } size_t count_vowels(std::string const& s) { return count_if(begin(s), end(s), is_vowel); }
Here is one way to implement it:
vector<size_t> lengths; vector<size_t> vowels; transform(begin(strings), end(strings), back_inserter(lengths), get_length); transform(begin(strings), end(strings), back_inserter(vowels), count_vowels);
This code is not really nice. There is quite a bit of noise. And there are two traversals of our string container, which might be problematic for several reasons.
- It might be a performance issue if the range is pretty long
- We might also be in a situation that forbids two traversals (input iterators)
To solve this double traversal problem, we might create a brand new lambda to combine our two traversals in one single traversal:
std::vector<std::tuple<size_t, size_t>> out; std::transform( begin(strings), end(strings), std::back_inserter(out), [](std::string const& s) { return std::make_tuple(s.size(), count_vowels(s)); });
This is fine, it works, but it is a bit noisy. Can we do better?
Pairing unary functions
If we look at our previous lambda, we see that our problem has to do with combining several functions together. As we did in the previous post, the goal is to identify and factorize the generic parts to remove the noise at call site.
To do so, we can write a higher order function that allows to group several transformations and return a transformation that collect their results as a tuple. I named it par_tf for “parallel transformations”:
template<typename... Fs> auto par_tf(Fs... fs) { return [fs...] (auto const& e) { return std::make_tuple(fs(e)...); }; }
We can use this par_tf function to collect several kinds of results in one traversal, with less noise than with our custom lambda:
std::vector<std::tuple<size_t, size_t>> out; std::transform( begin(strings), end(strings), std::back_inserter(out), par_tf(get_length, count_vowels));
With this traversal combinator, we get the benefits of having a single traversal but with less noise than when using lambdas.
More unary functions
We are not at all limited to pairing transformations, we can combine functions in many other interesting ways.
Pairing predicates: we can pair predicates to express conjunction or disjunctions of predicates. We could call these functions and_pred and or_pred to indicate they implement the boolean logic on functions returning bools.
template<typename... Ps> auto and_pred(Ps... ps) { return [ps...] (auto const& e) -> bool { return (... && ps(e)); }; } template<typename... Ps> auto or_pred(Ps... ps) { return [ps...] (auto const& e) -> bool { return (... || ps(e)); }; }
For example, we use it to filter out strings that are big or small, keeping only the medium sized strings:
vector<std::string> out; remove_copy_if( begin(strings), end(strings), back_inserter(out), or_pred(big_string, small_string));
You can find a more detailed work on the subject, with an analysis of the performance implications or playing with such higher order function in the following post.
Conditional transformations: we can combine two different transformations and choose which one to select based on a predicate (basically a if between functions).
template<typename Cond, typename Consequence, typename Alternative> auto alt(Cond cond, Consequence cons, Alternative alt) { return [cond, cons, alt] (auto& e) { return cond(e) ? cons(e) : alt(e); }; }
For example, we might want to traverse a list of string and count their vowel. But when the string is large enough, we know an estimation would be enough. This can be translated in code like this:
size_t estimate_vowel_count(std::string const& s) { return s.size() / 3; } std::vector<size_t> out; std::transform( begin(strings), end(strings), std::back_inserter(out), alt(big_string, estimate_vowel_count, count_vowels));
Pairing binary functions
Do we have to stay limited to unary functions? How would we deal with std::accumulate, the most generic algorithm of the STL?
Nothing prevents us from combining binary functions (or more). In the case of std::accumulate, we might call the functions steps, and the process of combining two of them pair_steps:
template<typename F1, typename F2> auto pair_steps(F1 f1, F2 f2) { return [f1, f2] (auto const& res, auto const& e) { return std::make_tuple( f1(std::get<0>(res), e), f2(std::get<1>(res), e)); }; }
We can use this to scan over our list of string, and count the total size of the strings, in combination with the total number of vowels.
auto sum_vowels = [] (size_t len, std::string const& s) { return len + count_vowels(s); }; auto sum_length = [] (size_t len, std::string const& s) { return len + s.size(); }; auto res = std::accumulate( begin(strings), end(strings), std::make_tuple(0, 0), pair_steps(sum_vowels, sum_length));
What if we wanted to combine more than 2 functions and call it par_steps?
Thanks to the posts on unpacking tuple and some posts from C++ thruths, I managed to implement it the following way (there might be an easier way though):
template< typename Fs, typename Res, typename Value, typename std::size_t... Is > auto parallel_apply( Fs const& fct_tuple, Res const& res, Value const& e, std::index_sequence<Is...>) { return std::make_tuple( std::get<Is>(fct_tuple)(std::get<Is>(res), e)...); } template<typename... Fs> auto par_steps(Fs... fs) { return [fs...] (auto const& res, auto const& e) { auto fct_tuple = std::forward_as_tuple(fs...); auto seq = std::make_index_sequence< std::tuple_size<decltype(fct_tuple)>::value >(); return parallel_apply(fct_tuple, res, e, seq); }; }
Using this, we can combine several of our data analysis functions into one single traversal to extract all information at once:
auto res = std::accumulate( begin(strings), end(strings), std::make_tuple(0, 0, 0), par_steps(sum_vowels, sum_length, sum_blanks));
Considering that accumulate is one of the most generic algorithm (check this video from Ben Deane if you want to know how more on this subject), this is quite a useful traversal combination function.
What’s next?
We have seen another set of tools that combines pretty well with STL algorithms and range libraries.
Together with the lenses function, traversal combinators allows us to perform pretty complex data transformations, with relative ease, while keeping our code modular and efficient.
In the next post, we will explore how we can use these techniques in more complex examples. And explain how they help to think differently about writing code dealing with large collections.