LISP Meta-Programming for C++ Developers: Compile time computations

In the previous posts, we started a series dedicated to familiarise C++ developers to the world of LISP meta-programming, based on the Clojure programming language. The goal is to offer a different perspective on meta-programming, how we can approach it, and the kind of use cases it can address.

We started the series by an introduction on Clojure, homoiconicity and how meta-programming works in LISP dialects. We introduced the basics of macros through simple examples and started to discuss compile time computations:

 
In the last post of the series, we talked about constexpr and how it makes compile time computation in C++ more accessible, testable and maintainable.

In today’s post, we will continue exploring compile time computations, and study some of the differences between what we can do with macros and constexpr. We will discuss the advantages of both constructs, and some of the constraints of C++ constexpr.

Note: this post builds on the previous posts of the series. Unless you already know Clojure and its macros, you need to read the previous posts to be able to follow this one.

 

C++ Constexpr composability


We will start our study by showing one thing that works really well with C++ constexpr functions: the ability to de-compose (and recompose) compile time computations, something that does not work as well with Clojure’s macros.

 

Back to compile time additions

In our previous post, we introduced how to perform compile time computations through the simplest possible example: compile time addition.

We went through and dissected how to do it in both Clojure and C++. You can find below the corresponding Clojure macro and C++ constexpr function:

(defmacro add-m
"Summing two integers known at compile time"
[a b]
(+ a b))
view raw add-m.clj hosted with ❤ by GitHub
constexpr int add_constexpr(int a, int b)
{
return a + b;
}

If you do not feel comfortable with these constructs, pause there and go back to the previous post before continuing, as it is a pre-requisite for the rest of the post.

 

The problem with the add-m macro

Remember than a macro in LISP works by manipulating the AST fragments it receives as input, and produces an AST fragment as output.

Upon finding the macro call (add-m 1 2), the Clojure compiler will expand the macro. To do so, it calls the add-m macro with the AST fragments wrapped by the macro as arguments.

In our example, the arguments are the two AST fragments 1 and 2. Since these AST fragments happen to be integers, they can be summed together. The net result is 3 as compile time output:

(walk/macroexpand-all '(add-m 1 2)) ;; Expanding the macro
=> 3 ;; A constant after compilation

This is quite subtle and relies pretty heavily on dynamic typing to work. But when we introduce the intermediary variables x and y and call add-m on them, the Clojure compiler gets lost, and we get a compilation error:

(def ^:const x 1) ;; Global constant x = 1
(def ^:const y 2) ;; Global constant y = 1
(add-m x y) ;; Trying to sum them at compile time
;; The compilation fails: we try to sum symbols, not numbers
ClassCastException clojure.lang.Symbol cannot be cast to java.lang.Number

Even though x and y are symbols that refer to integer values (1 and 2 respectively), the AST fragments given to add-m are the symbols themselves and not their value. And since we cannot sum symbols, it fails to compile. We will see later in this post how to solve this.

 

C++ constepxr compose just fine

There is no such problem with constexpr. As long as the variables we sum are constant expressions, we can introduce temporary variables and it works fine:

constexpr int add(int a, int b)
{
return a + b;
}
//Works fine, and z will be equal to 3
constexpr int x = 1;
constexpr int y = 2;
constexpr int z = add(x, y);

Being able to decompose an expression is the key to build maintainable software: decomposition allows decoupling and separation of concerns. C++ constexpr functions offer this characteristic out of the box. As C++ developers, we should be greatful for this.

And since constexpr functions are both usable as standard run-time function and as a meta-function, C++ ends up offering a quite satisfying meta-programming experience, when it comes to simple compile time compilations.

 

Functions and macros

Functions are much more composable than macros. This is one important lesson that Clojure programmers learn quite early when experimenting with meta-programming.

Since composition is such a great weapon against software complexity, functions should be preferred over macros whenever both can do the job. Macros should be reserved for tasks that cannot otherwise be performed with functions. Whenever we can, we should extract the heavy lifting of macros inside functions.

The reason why constexpr functions are more convenient to decompose than macros is precisely because constexpr functions are just functions.

 

Extending the language


We just learned how compile time computations using macros happens to be less composable than using C++ constexpr functions. Can we fix this?

One of the good thing with macros is that we can use them to extend the language (almost) as we wish. Let us try to build a constexpr equivalent for Clojure.

 

Hacking our way around

Before trying to patch the language by adding some new constructs in it, let us see how we can fix our issue differently. One way to fix a problem with a macro is to introduce more macros (seems legit).

The trick we will use is to create a macro that will return a code fragment that contains another macro inside it. The Clojure compiler will keep expanding macros as long as new macros are returned.

For instance, to compute z the compile time sum of x and y, we create a macro named z that constructs a code fragment where add-m is called on the values of x and y:

(defmacro z []
(let [x 1
y 2]
`(add-m ~x ~y)
))
(walk/macroexpand-all '(z))
=> 3

How does it work? Remember that backquote acts as an escape mechanism to return an unevaluated AST, while tilde allows to do “interpolations” inside the returned code fragment.

So the macro z will return the code fragment (add-m 1 2) in which a and b have been replaced by their values, thank to the tilde. Since add-m is also a macro, Clojure will expand it, yielding 3.

Note: If you are unclear about how quoting (using backquote) and unquoting (using tilde) works, you can refer to our previous article in which we explained this in more details and using simpler examples.

 

Extending the language

We just saw how to decompose a compile time computations using local variables inside a macro. If we want to make x and y visible as global constants instead (to reuse them elsewhere for instance), we will have to use something else.

Because macros allows extending the compiler, it effectively grants the developer the power to add new features into the language. This is quite common and part of the philosophy of LISPs. We will use this to solve our problem.

Let us create a constexpr equivalent for Clojure. To do so, we will use another interesting feature of Clojure, the eval function. This function takes an AST fragment as input, and evaluates it (*):

`(+ 1 2) ;; Escaping an AST fragment
=> (clojure.core/+ 1 2) ;; Returns the created fragment
(eval `(+ 1 2)) ;; If we call `eval` on it
=> 3 ;; We get back the value

Based on eval, we define below our constexpr macro. This macro creates an AST that corresponds to the call of the function f with its arguments args, and then immediately evaluates the constructed fragment:

  • The args will capture all arguments after fct (like variadic templates: …args)
  • The tilde-arobase unpacks the variadic arguments in the enclosing expression
  • The eval function then evaluates the constructed fragment
(defmacro constexpr
[fct & args]
(eval `(~fct ~@args)))

Let us first illustrate how we can use it before explaining how it works in more details. Here is how we could define a add-m using the constexpr macro. Now, we can sum two global constants:

(defmacro add-m
[a b]
`(constexpr add ~a ~b))
(def ^:const x 1)
(def ^:const y 2)
;; Summing two global variables at compile time
(walk/macroexpand-all '(add-m x y))
=> 3

To understand how it works, we will go step by step. Calling (add-m x y) first returns the AST fragment (constexpr add x y), which contains a second macro. The Clojure compiler sees another macro, and will thus expand this AST fragment as well.

Expanding (constexpr add x y) will create an AST fragment (add x y) before immediately evaluating it. The evaluation will resolve the symbols add, x and y (yielding the definition of add, and the values 1 and 2 respectively), before proceeding with the function call. And so we get 3.

The key difference with our previous macro is the use of eval. Because eval resolves the symbols x and y, we end up summing integers and not symbols. The different steps of macro-expansion are depicted below:

(add-m x y)
=> (constexpr add-m x y)   ;; Macro-expansion
=> (eval (add x y))        ;; Macro-expansion
=> (definition-of-add 1 2) ;; Symbol resolution
=> 3                       ;; Application of add

(*) Some languages such as Python or Ruby do offer eval functions as well. The key difference is that the Clojure variant takes an AST as input, while the usual eval function takes a string. This difference is an important one: dealing with data structures is more powerful than doing string manipulations.

 

Constexpr combines two solutions into one

We just saw two different ways for Clojure to deal with the decomposition of a compile time computation into pieces. One technique applies for decomposition into local computations. The other applies for decomposition into global computations.

This means we have two different solutions in Clojure, while C++ provides a single way to deal with the same two problems. Clearly, constexpr has some good advantages here over macro in terms of usability.

This shows how fantastic the addition of constexpr was for C++. It provides a consistent syntax for functions and meta-functions, it allows to test meta-functions, and it is quite intuitive compared to alternatives. But it is also limited as we will see.

 

Macros strike back


The previous sections show some of the very good sides of C++ constexpr. We will now explore some examples of compile time computations where constexpr functions are not as flexible as macros.

 

Not so hidden motivations

These examples are not meant to diminish the merits of constexpr or mock C++ meta-programming. They are meant to discuss some shortcomings of the current state of constexpr, not its potential.

In fact, the current limitations are related to the current constraints imposed on constexpr by the standard, some of which could probably be relaxed. Which constraints could be relaxed is the subject of the next section.

 

Compile time average

Compile time computations are rarely as simple as in our previous examples. Summing two integers at compile time has not much value added. Optimizing compilers will usually do it anyway.

Let us try something slightly more complex in this section: computing the average of a collection of values, at compile time. As a reminder, we explained how to implement average in Clojure in our very first post:

(defn average
"Average of numbers known at runtime"
[coll]
(/ (reduce + coll) (count coll)))

Where reduce is the Clojure equivalent of the STL std::accumulate. What it does is in essence the same as the following C++ code:

template<typename Container>
int average(Container const& coll)
{
return std::accumulate(begin(coll), end(coll), 0, std::plus<int>{}) / coll.size();
}

 

Clojure solution

Because we can call any function inside a macro, implementing a compile time average, with the macro average-m, is only a matter of calling our already implemented average function:

(defmacro average-m
[coll]
`(constexpr average ~coll))
(walk/macroexpand-all '(average-m [1 2 4 5]))
=> 3 ;; After compilation

It does not look like much. But there is something here that a C++ constexpr function would not be able to do. We just used a Clojure’s standard vector (which require dynamic allocation) inside a compile time computation.

 

Constexpr restrictions

If you read the specification of constexpr, you find that it is subject restrictions that come from the specification of constant expressions in C++.

For instance, both inputs and outputs have to be literal types. As a result, the following constexpr function will refuse to compile, GCC complaining that std::vector is not a literal type:

template<typename Number>
constexpr Number average(std::vector<Number> const& vals)
{
return 0; //Implementation does not matter, it won't compile
}

In fact, GCC will only complain if average is used inside a constexpr context:

// Compiles fine
int avg = average(std::vector<int>{1, 2, 4, 5});
// Does not compile
constexpr int avg = average(std::vector<int>{1, 2, 4, 5});

The item 18 of the constant expression specification also specifies that we cannot use new or delete to compute a constant expression. As a result, compiling the following function fails whether or not the function is used in a constant expression context:

constexpr int no_new_delete()
{
std::vector<int> ints{1, 2, 3};
return ints[0];
}

 

C++ solutions

The before mentioned restrictions mean that, to compute our average at compile time, our constexpr function cannot take a vector as input, nor can it instantiate one inside its implementation. So we have to resort to alternative solutions.

One solution is to use variadic templates and the fold expression of C++17:

template<typename... Vals>
constexpr int average(Vals... ds)
{
return (... + ds) / sizeof...(ds);
}
// Compute the value `3` at compile time
average(1, 2, 3, 4, 5);

Another solution is to use std::array, a valid literal type:

template<typename Number, size_t N>
constexpr Number average(std::array<Number, N> const& vals)
{
Number total = 0;
for (int i = 0; i < N; ++i)
total += vals[i];
return total / N;
}

Quite unfortunately, we cannot make use of the std::accumulate algorithm to make our solution less imperative. Most algorithms are not constexpr yet, and so the following implementation does not compile:

template<typename Number, size_t N>
constexpr Number average(std::array<Number, N> const& vals)
{
// Fails to compile as std::accumulate is not constexpr
return std::accumulate(vals.begin(), vals.end(), 0, std::plus<Number>{}) / N;
}

 

Impact of these restrictions

Because of the heavy restrictions on C++ constexpr functions, a lot of tools available at runtime are not accessible at compile time. Standard containers are out of reach. Algorithms are not yet available. Any code that is not constexpr cannot be reused.

This contrasts with Clojure, in which there is no difference between the runtime and compile time world in terms of code we can write. The same data structures are available, the same algorithms, with the same behavior and almost the same performance.

We discussed in our previous post how important it was for accessibility to have a similar syntax and the same set of tools for both function and meta functions. Because of these restrictions, constexpr functions do not quite reach this ideal.

 

Arguments for relaxing constexpr constraints


One rationale behind the restrictions on constexpr functions is that the forbidden constructs (such as dynamic allocation) do not make sense for constant expressions. Let us discuss this rationale and see whether some of them could not be relaxed.

 

Relaxing the implementation

Let us assume for now that both inputs and outputs of a constexpr function indeed have to be literal types. Why could we not relax the constraints for what concerns the implementation of the function?

In Haskell, pure functions are allowed to mutate data inside them. As long as no side-effect can be observed from the outside, a function can be considered pure. The same goes for noexcept functions which may call functions that can throw.

Would the same line of reasoning apply for constexpr function? A constexpr function could be allowed to instantiate a vector or a map as long as it stays an implementation detail and does not leak through the interface.

 

Performance considerations

Let us consider a function frequency_map that takes as input an array of integers, and computes the occurrence count of each element.

It cannot return a std::map associating to each value an occurrence count, because of the literal type restriction. It cannot return a std::array of pairs containing the associations either, as the size of this array would be hard to know a priori.

So we make it return an array of integers of the same size as the input array. Each value of the output array is the occurrence count of the corresponding element in the input collection (the element at the same index).

For instance, for the input [1, 2, 1, 1, 3, 3, 1], we get:

Input:  [1, 2, 1, 1, 3, 3, 1]
Output: [4, 1, 4, 4, 2, 2, 4]

The constexpr restrictions already constrained our interface pretty heavily. They also constrain the implementation of the function. Even with our selected prototype, we could imagine using a std::map to make the algorithm efficient:

template<typename Value, size_t N>
constexpr std::array<int, N> frequency_map(std::array<Value, N> const& values)
{
std::map<Value, int> freq_map;
for (int i = 0; i < N; ++i)
freq_map[values[i]] += 1; // Count each occurence using a std::map
std::array<int, N> freqs {}; // Reconstruct a std::array from our std::map
for (int i = 0; i < N; ++i)
freqs[i] = freq_map[values[i]];
return freqs;
}

But it will not compile, since we cannot use a std::map in the implementation either. We are therefore left with two choices:

  • Use a less efficient algorithm (potentially N squared complexity)
  • Develop a constexpr compatible version of the STL associative containers

None of these alternatives looks very appealing. The first might slow down compilation time due to the algorithm’s inefficiency. The second forces developers to learn another container API and violates DRY by doing so.

 

Performance traps

The talk Constant fun (CppCon 2016) shows the difficulty of developing our own efficient constexpr-compatible data structure or algorithm. It also shows the kind of performance traps waiting for us when we decide to do so.

The speaker shows at the 42nd minute that the performance of his compile-time merge sort algorithm is quite slow. The speaker does not show the full implementation, but the complexity of the algorithm might be a valid explanation for this slowness.

The implementation indeed makes use of cons and tail to manipulate arrays. Both are likely lacking the efficiency of their functional programming counterparts, which work on list with O(1) complexity: creating new arrays has O(N) complexity.

So both cons and tail will likely trigger copies of the whole array, making the merge operation pretty expensive. This same performance defect may occur with repeated push_back-like calls on std::tuple. Without amortization techniques or structural sharing, creating a tuple by pushing back N times in it has a quadratic complexity.

 

Going further: relaxing inputs and outputs

What about relaxing the constraints of literal types on inputs and outputs? Would it be feasible? And would it be worth it? We will list below some arguments that would go in favour of relaxing the literal type constraint.

Consider below how simple it is in Clojure to compute a frequency map at compile time, what we just struggled with in C++. In Clojure, it is simply a matter of calling a function that does the job at run-time (freq-map here):

(defn freq-map
[coll]
(reduce
(fn [freqs val]
(update freqs val (fnil + 0) 1)) ;; Adds + 1 on each value
{} ;; Initial frequency map (empty)
coll)) ;; The values to add in the map
(defmacro freq-map-m
[coll]
`(constexpr freq-map ~coll)) ;; Call the run-time version
(walk/macroexpand-all
'(freq-map-m [1 2 1 4 1 3])) ;; Macro-expansion to simulate compilation
=> {1 3, 2 1, 4 1, 3 1} ;; Values computed at compile time

Consider the advantages of having the same level of performance at compile time than at run-time. This above naive implementation takes 15 milliseconds for an input of 10000 elements at compile time. Optimized versions go below 5 milliseconds.

This is two to three times slower than the run-time version, but it is still pretty decent performance for a compile time computation. But most importantly, this ensures that compile time computations will not be the bottleneck of the compilation.

Consider that being able to produce standard containers at compile time also has a positive impact on run-time performance. The runtime code will not need to transform custom compile-time-compatible data structures back to standard containers.

 

It is not that bad

C++ does not need to support non-literal types for inputs and outputs of constexpr functions. There are alternatives to compute things such as frequency maps. Optimizing compilers might be able to do it automatically if everything is made const properly.

We could also push the computation at “init time”, forcing the computation to happen just once at the initialisation of the program, and storing the result in some static variable (either global or local depending on the need).

Another alternative, if such a computation is not affordable at “init time” (and is not optimised away), is to generate the result using an external program. This causes some problem on its own: like informing interested parties that the source of truth is outside of the source code of the program.

 

Conclusion and what’s next


Through this post, we went deeper into the world of compile time computations in both C++ and Clojure, and compared constexpr with macros.

Through this exploration, we discovered the really good sides of the C++ constexpr functions and the pleasant meta-programming experience it offers. We also saw some of its limitations, when faced with more complex compile time computations.

Those limitations are mainly due to the constraints that constant expressions have in C++. Some of these restrictions could maybe be relaxed for constexpr function. We discussed one potential improvement, and the benefits it could bring, both in terms of maintenance and compile times.

This closes the chapter of compile time computations. In the next post, we will continue our meta-programming journey by diving into AST manipulations. We will see how we can leverage macro to automate operational concerns, and ways to emulate this in C++.


Follow me on Twitter at @quduval.

Leave a comment

Create a website or blog at WordPress.com

Up ↑