Catamorph your DSL: C++ Port

In the previous posts, we went over the process of building a DSL for arithmetic operations and we introduced the concept of Catamorphism as a way to decouple the traversal of the AST from the operations we want to perform on it.

We saw how it could help us compose operations before traversing the AST of our DSL, leading to more efficient composition and better testability.

Unless you are quite familiar with catamorphisms, you might want first to read these previous posts before this one:

All these previous posts were based on Haskell. Because Haskell is such a special (wonderful) language, the last post started to explore whether this concept would port as nicely in other languages.

We translated our Haskell code into Clojure, a functional language that put much less emphasis on types as Haskell. We saw that this concept translated to a post-order depth first search. The resulting Clojure code was both short and elegant.

This post will be focused on exploring the limits of the applicability of this concept some more, by trying to bring it into C++.

 

Boost to the rescue


We could implement our DSL in C++ with only the STL available, using Design Patterns such as the Visitors.

To keep the code short and concise, we will however rely boost as well. All the code that follows will assume the following Boost header inclusions:

 
In particular, boost::variant will be one of the cornerstone of the design, as it allows to avoid the Visitor design pattern. You might want to have a look at the documentation of boost::variant before proceeding.

The rest of the header are mostly for convenience. We could do without it, but at the expense of some more code to write.

 

Arithmetic expression in C++/Boost


There are many ways we could represent our expression DSL in C++. We could choose to use the Visitor design pattern for example. The solution below is based instead of boost::variant for convenience.

The recursive expression_r implements our open recursive data type. This type is implemented as a boost::variant on:

  • nb which represents an integer constant
  • id which represents a variable identifier
  • add_op which represents an addition operation
  • mul_op which represents an multiplication operation

The add_op and mul_op operations are themselves factorized as a single tagged op data type. Two tags represent the two operations, add_tag and mul_tag.

 
To get our recursive expression, we will use the same trick as we used in Haskell: we will compute the fixed point of the type recursion.

As in Haskell, C++ is not able to create infinite types. The Fix wrapper type of Haskell needed to circumvent this limitation is replaced by the boost::recursive_wrapper. The overall design is the same:

 
Note: this naive implementation triggers deep copies at each level of our AST. This might lead to a quadratic behaviour in the depth of our tree when playing with transformations on our AST. A production ready implementation would have to handle these aspects we voluntarily left out for simplicity. We will come back to this issue in the conclusion.

 

Factory functions and accessors


The trouble with constructors of class templates in C++ is that they require their explicit templates to be listed in the code. The usual trick is to define factory functions outside of the class to automatically deduce the template parameters.

We will define some factory functions to help us write code more easily. These functions are somehow similar to the “smart constructors” of Haskell:

 
Having introduced this layer of “named constructors” allows us create expression more succinctly and to better declare our intent:

 
Similarly, to “pattern match” on our boost::variant, we will introduce some helper accessors functions. Indeed, the trouble with boost::get is that it requires explicit templates to be listed in the code. Dedicated and correctly named functions will simplify the pattern matching greatly by removing noisy templates from our code:

 
Note: boost::variant also offers visitors to handle pattern matching as an alternative to boost::get. But as our interpreters will not need to exhaustively match all cases, sticking to boost::get ends up being less code. This is ultimately a matter of taste: we could have done the other choice instead.

 

Our C++ Catamorphism


Now that we have our expression_r type, we will need to write the equivalent of the catamorphism function cata of Haskell. This function will be responsible for lifting local AST transformations into operations on the whole AST.

The details on how to build a catamorphism from an open recursive type were already covered in the previous post on catamorphism in Haskell. This section will only aim at translating this code into C++.

 

Functor Instance

The first step is to make our expression_r type an instance of a Functor. In C++, one way to do it is to implement fmap for our type:

  • The map argument represents the transformation to apply
  • Constants and variable matches are left unchanged by this transformation
  • For operations, we recursively apply the transformation on the sub-expressions

We use boost::ranges to make the code more succinct:

 

Catamorphism

Next, we need to write the catamorphism function, whose goal is to apply fmap at each level of our expression, starting from the lower levels.

The details behind the inner workings of this function are provided in a previous post on Haskell. We only translate here this definition to C++:

  • We replace function composition in Haskell by nested function call in C++
  • We use get to extract the expression from the recursive_wrapper

Some of the types listed below are deduced automatically. We left them in the code to show what is going on under the cover:

One noticeable drawback of this implementation is the need to provide the output type of the cata function explicitly. This could probably be dealt with.

 

Pretty printing our expression in C++


We now have everything to start implementing our first interpreter, our expression pretty printer.

It is noticeably more verbose than in both Haskell and Clojure, but not that much. I was quite surprised by the overall concision:

  • print_alg handles pattern matching and the two trivial cases
  • print_op handles the operations by joining the sub-expression strings, prepends the operation and wraps the whole with parentheses

 
At call site, the code is not that verbose either. The main drawback is the need to specify the output type of the catamorphism (std::string in this case):

Now, this is not the most efficient code that we could come up with. It does create quite a lot of intermediary strings. This is the subject for another time.

 

Eval and dependencies


We can as translate to C++ our next two most straightforward interpreters, eval and dependencies.

The eval function needs an environment holding the value of the variables in the arithmetic expression. To make it simple and keep the code short, we will:

  • Rely on a std::map from id to integer to model this environment
  • Ignore errors such as an unbound variable in the environment

In addition, and to compensate for the absence of currying in Haskell, our eval function returns a lambda:

 
The dependencies functions lists all the variables of the expression. It sub-expressions are transformed to a set of variable identifiers, which are combined (set union) while climbing up the AST.

 

Composable C++ optimizations


Until now, the main advantage of introducing Catamorphism was to specify local transformations, which the cata function could lift to operations on the whole AST of our arithmetic DSL.

We will now leverage on the other advantage of Catamorphism, the ability to compose them efficiently, to implement our arithmetic AST optimization functions.

We left out the detailed implementation of the has_zero and optimize_op functions in the snippet below. The detailed implementation of these functions is available on GitHub.

This allows us to concentrate on the real value added by Catamorphisms, composability of transformations:

  • opt_add_alg is only concerned in optimising additions
  • opt_add_mul is only concerned in optimising additions
  • optimize_alg is only concerned in combining these algebra into one

This leads to the following code:

 
Note that how the optimize_alg function is able to combine the two algebra by composing them. The call to get allows to unwrap the expression of the first algebra, before calling the second algebra. We can give it a try:

 

Partial evaluation


Our last remaining challenge is to implement the partial evaluation of an arithmetic expression.

The partial_eval_alg simply replaces variables bound in the evaluation environment with their respective value. As for our evaluation function, our partial evaluation returns a lambda to compensate for the lack of currying in C++.

 
We can combine this function with our optimization functions to obtain a fully fledged partial application that simplifies expression in one single tree traversal.

 
We can also re-implement our evaluation function in terms of our partial evaluation:

  • If the resulting expression, after optimizations, is a constant, we output it.
  • Otherwise, we can use our dependency function to output the missing variables

This leads to the following evaluation function, which handles errors much nicer than our previous one:

We can give it a try to convince ourselves that it works:

 

Conclusion


In this post, we managed to port the Catamorphism notion in C++ to build a series of arithmetic DSL interpreters made of:

  • Composable local operations (our different algebra)
  • Decoupled from the traversal of the AST (handled by cata)
  • Avoiding any kind of mutations on our AST

The complete resulting code is about three times the length of the equivalent Haskell or Clojure code (320 lines give or take), which is quite good overall. But there is a catch: this implementation is unrealistically simplistic. This is the topic of the next section.

 

What’s the catch?

Our simplistic memory management did exhibit some problematic quadratic behaviour in terms of copy. Addressing this concern would require some manual work: C++ offers no standard support for immutable data structure with structural sharing in its standard library, so we would probably have to rely on move semantics.

We also had some remaining issues regarding type deduction, which would require some more tricks to improve. The biggest issue is however dealing with the long error messages GCC outputs upon the simplest type error. Try removing the .get() in the catamorphism function, or messing up with the types of an algebra function, and you shall see. Concepts could probably help there.

So I do not think that the 320 lines of C++ code do quite compare in maturity with the 120 corresponding lines in Haskell.

 

On Catamorphisms in C++

This concludes our study of the applicability of Catamorphism in C++. Overall, I felt like this design quite not as natural as it was in Haskell or Clojure. It sometimes felt like going against the idiomatic use of the language, which is not a good sign.

I recently read some good articles on the functional paradigm bursting in into C++ (like this series of posts on http://www.modernescpp.com).

Although this is true that C++ made great leaps in this direction by encouraging higher order functions usage with the addition of lambdas, it felt awkward using them in this context.

But the main difficulty in porting this concept to C++ is that it is implicitly based on efficient persistent data structure to exist in the language. And the support for immutable data and persistent data structures in C++ is simply not there.

To summarize, implementing catamorphisms in C++ was quite a fun exercise (except for the compilation error messages spanning over 2 or more screens), but it did not feel like it fits C++ as it does for Haskell or Clojure.

Combining traversals by example

This post is the third of the series of post focused on providing useful higher order functions that goes hand in hand with STL algorithms and ranges.

The two previous posts went over the following set of tools:

  • Lenses allow to “zoom” on the elements being scanned
  • Traversal combinators allow to merge several scan into one single scan

Today’s post is about considering a more complex example, inspired from real work, and see how these tools helps in solving it.

I hope to convince you that these tools are useful, if not to put directly in practice, at least to help you consider different design choices.

 

Motivation


I work as a Back Office software engineer for a trading platform. Our platform receives a lot of financial transactions per day. These transactions contribute in the position of the bank, generate payments or confirmations, participates in the risk, etc.

We rely on stream processing during the day and we switch to batch processing outside of working hours to consolidate the data. In batch mode, we need to handle a lot of transactions fast, much more than can fit in memory.

To do so, we parallelise the work across several engines. Each engine streams a set of transactions from the DB through the functions we want to execute on them (recompute positions, consolidate the risk, etc.).

The work performed by the engines can be assimilated as a traversal of the DB, to run several computations of data, scanning each transaction only once.

The problem description below is greatly inspired from this kind of problem.

 

Problem description


Instead of receiving a stream of financial transactions, we will receive a stream of stock values as strings. Each string will contain the value of the name and the value of the stock.

Instead of consolidating positions or risk, our API client is interested in extracting trends from this data. He has a bunch of different metrics of interest, things like:

  • Finding the minimum value
  • Computing the average value
  • Counting the points far away form the initial stock value

Our client is not interested in running all of these metrics all the time. He wants to be able to cherry pick them. Our goal is to provide an API to build this capabilities easily, from the raw stream of stock value. We need to:

  • Parse these inputs
  • Be able to select a stock name
  • Extract the stock value
  • To combine the desired analytics on them

 

The terrible solution


Sit tight and be prepared for a true moment of horror. What follows is obviously a terrible solution, but I had to include it as:

  • It is unfortunately pretty common to see such patterns in our code.
  • It has a pedagogic interest: it shows us where not to go.

This is a design in which everything is packed together. The reading of the inputs, the filtering and the processing, the selection of the different metrics… everything is tightly coupled inside one function.

The usage of this code is quite obscure as well as the reader will have great troubles figuring out what each boolean value represents:

 
So this is obviously awful code. And I claim that better naming, method extraction, stronger types (instead of booleans), none of this will really make is much less awful: we are dealing with an architectural problem.

But this bad design has its pedagogic merits. Inside a single function, it exposes all the things we want to avoid:

  • The source of data is coupled with the traversal
  • The algorithm is coupled with the parsing and filtering
  • The algorithm contains data specific to the metrics
  • The data of disabled metrics data is still initialised
  • The output of the function contains non-computed metrics results
  • The client code is coupled with all metrics

As a result, adding a new metric will trigger might break the other metrics: everything will have to be re-tested again.

 

Our goals


Let us negate all the bad things that we identified from the previous code, to get an idea of what we are looking for:

  • A source of data decoupled from the traversal
  • An algorithm that does not know about the filtering nor the parsing
  • An algorithm agnostic of the data specific to the metrics
  • Disabled metrics should have no cost
  • Disabled metrics do not appear in the results
  • A client code only coupled with the metrics it uses

The rest of this post will try to come up with a solution that comply with all these desirable properties.

 

Taking care of the data source


Let us first start by separating the traversal (the algorithm we will later select) from both the source and the filtering / transformations we will perform on it. The following code provides a pure pipe-line, implementing using boost ranges:

Returning a lambda allows to build a pure pipeline, agnostic to data source. The client code will be able to plug its source to this pipeline, to assemble it with the rest of the algorithm:

 
Using lens functions here (to and equaling) leads to shorter and more declarative code, that can almost read like a sentence:

  • “read stock filtered by equaling ID on the stock id”
  • “transformed to the stock value”

 

Thinking in terms of collections


Now, we need to find a way to separate the algorithm from the metrics we would like to plug in it, all in one pass. This means that the following (although nicer) API would not solve our problem:

It would indeed implicitly force several scan of our data:

 
Thinking in terms of collection as encouraged by Mike Acton does not necessarily mean taking collections as input.

In the previous code snippet, you might have noticed that we assembled our pipeline with the source each time we call an algorithm. We had to. Const ranges are not const themselves: they represent ranges that only provide const access to the element the iterator points to. This is very different from a real container, which would have been const as well: we could have passed it to the tree calls directly.

 

Thinking in terms of steps


You might have already guessed that the solution relies on the previous post, in which we introduced functions to combine several traversals into one.

The second important insight provided by the traversal combination functions is realizing that we can deal with collections by decomposing them into steps. These steps can be either be plugged directly into an algorithm or be combined together into one step before doing so.

Our API will thus provide step functions that can be easily combined and then be used inside the std::accumulate algorithm. We only need two functions by metric:

  • A binary step function, to build a result from a list of elements
  • An initialisation value, or equivalently a no argument function

 

Our metrics as step functions


Let us start by writing our metrics as step functions.

Finding the minimum
Each step is a simple call to the std::min function. The initialisation can be implemented in several ways (with optional for example). I kept it as stupid as possible deliberately:

 
Finding the average
It requires a little more work: we need to keep track of the number of elements as we go along. This is the motivation behind the RunningAverage structure:

 
Computing big variations
This last step function requires to capture a limit parameter, and to remember the first value. It could be done differently, but is kept voluntarily simple here:

 

Assembling the steps


To finish up, we only need to provide a small generic helper function, based on the par_steps, to combine all these steps together:

 
Now, our client only needs to assemble the steps together to extract all the metrics it needs. It can be done quite elegantly:

 
Last but not least, if the client feels uncomfortable in dealing with tuples in the rest of this code, wrapping the fold with std::apply will let him summarise his results easily.

  

Conclusion


This was a pretty long post, much longer than I first anticipated. I hope you enjoyed it nevertheless, and that it showed you the benefits of thinking in terms of lens functions and traversal combinators:

  1. We are not limited to create collections from other collections each time we want to transform them. Instead, we can provide “views” on collections by combining ranges and lenses.
     
  2. We are not limited to traverse the collections each time we want to compute something from their elements. Instead, we can combine our computations before running them on our entire collection.

In the end, the tools themselves are not really complex and not that important. The main benefits of knowing them is to be able to think differently about collection processing, and sometimes to come with shorter, faster and more elegant solutions.