Lisp Meta-Programming for C++ Developers: Automatic Traces

In the previous posts, we started a series dedicated to familiarise C++ developers to the world of LISP meta-programming, based on the Clojure programming language. The goal is to offer a different perspective on meta-programming, how we can approach it, and the kind of use cases it can address.

We started the series by an introduction on Clojure, homoiconicity and how meta-programming works in LISP dialects. We then visited the world of compile time computations and constexpr functions. You can find below our previous posts:

 
In today’s post, we will leave the world of compile time computations to go into the world of AST manipulations. Our goal will be to show how interesting such a feature would be for C++, knowing that it is a topic under discussion.

To do so, we will build a set of macros that will allow us to automatically log the inputs of a function at each call. Through this, we also hope to convince C++ developers that AST manipulations and reflection are amazing meta-programming tools. We will also see why being able to perform side-effects at compile time is an interesting feature.

Note: this post builds on the previous posts of the series. Unless you already know Clojure and its macros, you need to read the previous posts to be able to follow this one.

 

Motivation: Automatic traces


In a piece of software, some functions are critical. For operational concerns, we might want to know when and how they are used at runtime. Practically, we would like to have useful log traces reporting each call to these critical functions.

 

Requirements

To be useful, we want these log traces to contain the name of the function, as well as the arguments that are passed to it at each call. As a bonus, we would like the name of the arguments as well as their values to appear in the logs.

To be easily exploitable, we want all the log traces of the entire program to follow the same pattern. We would like to make sure that it is hard to mess up, such as logging the wrong value or the wrong argument name.

We want to support two use cases. In one use case, the logs might toggled on and off at run-time. In the other use case, the logs are enabled in debug builds only, and release builds should not contain code related to logging, to be as fast as possible.

 

Examples of logs

For instance, we want to add logs to a function named repeat-n in Clojure, and repeat_n in C++. This function takes two arguments, an integer named n, and a string named s, and returns the string repeated n times.

Upon calling the function with the arguments 3 and “toto”, we expect the following log to be produced (to keep it simple, in the standard output):

[TRACE] Call to "repeat-n" with arguments: {n 3, s "toto"}

We want the real name of the parameters of the function to appear in the logs. If the parameters are renamed to m and str, we expect the new log to be:

[TRACE] Call to "repeat-n" with arguments: {m 3, str "toto"}

 

Why keep the name of the arguments?

For one, it helps understanding the logs by providing more information. We would otherwise have to match the values as they appear in the logs with the source code, to deduce the arguments and thus the meaning of the values.

It also makes it possible to log only parts of the arguments. We could decide to ignore an argument that is meaningless, in which case having names allow us to ignore any argument and not only the last one (as for default parameters).

Finally, because logging the argument names makes the problem non trivial and interesting. Logging the argument values alone (in the order in which they appear in the function) can be achieved by just a variadic higher order function.

 

Design Philosophy

To avoid any human errors, we would like the logs to be as automated as possible. We want a simple way to add, without much room for errors, logs that traces the calls to our important functions.

But things are never quite simple, we might want to manually take control to avoid logging a useless parameter (a tag dispatch argument for instance). We thus want an API we could manually call in case the automatic mode is not desirable.

Note: This section is called “design philosophy” but contain something that looks like a requirement. In truth, with or without this requirement, we should aim at provided APIs first, and then find ways to automate their usage for common cases. Designing this way almost always leads to more re-usable code and less coupled design.

 

Hidden challenges


One of the requirement above might have struck you as being quite hard to achieve: the logs shall contain the real name of the parameters. Achieving this requires the language to be able to understand itself, ability that we usually call reflection (this is not quite exact if you look at the actual definition, but this will do).

 

Why reflection?

The ability for a language to understand itself is really powerful. It allows automating repetitive tasks when used in combination with meta-programming. With it, we can automate things such as serialisation, or logging arguments in our case.

Clojure is able to understand its own AST and has naturally the ability to understand itself both at compile time and at runtime. We will show how to use this capability to implement our automatic logging facility.

 

Hacking our way around

C++ has not yet access to reflection capabilities. The good news is that there is a lot of on-going discussion to bring such a capacity into C++. You can find a recent post that talks about some of these initiatives.

In the meantime, we will do without it. We will use strings as parameter names, and will rely on C-macros to string-ify function’s parameter names using the # operator. This is ugly, but preprocessor directives are still needed in C++ and it will get the job done.

 

An API for logging


We will start by building a small API (a few functions) to manually log function calls with their arguments, and later (in our next post) automate the calls to this API. We will begin with the Clojure implementation; the next section will deal with C++.

 

Clojure implementation

Our Clojure implementation will consist of two functions:

  • One to compute the prefix “[TRACE] Call to FUNCTION_NAME with arguments:”
  • One function that takes a prefix and logs the argument names and values.
(defn function-log-prefix
"Return the prefix of the log entry for a function"
[fct-name]
(str "[TRACE] Call to \"" fct-name "\" with arguments: "))
(defn log-symbol-values
"Log the values of a list of symbols, with a prefix"
[prefix symbols values]
(str prefix (zipmap symbols values)))

The implementation uses zipmap will creates a map out of a list of keys (the argument symbols) and a list of values (the argument values).

 

How to use it

The following example demonstrate how we can use the API to log the values of the global variables a and b, using log-symbol-values:

(def a 1)
(def b 1)
(log-symbol-values
"Prefix: " ;; Prefix of the log
'[a b] ;; Quoted variables list
[a b]) ;; Values of variables
=> "Prefix: {a 1, b 1}"
  • Line 1 and 2: we defined two global variables a and b
  • Line 5: we provide a prefix of our choice to the log-symbol-values
  • Line 6: we provide the quoted symbols of the variables to log
  • Line 7: we provide the values of the variables to log

You might have noticed the simple quote instead of the backquote at line 6. The effects are almost the same (*). The important thing is that we escape the symbols: we provide a and b un-evaluated to the log-symbol-values so that we can get their name in the logs.

(*) Technically, simple quote do not support interpolation as the backquote do. They also do not resolve the namespace of the symbols. We use them for the second effect here.

 

Why two functions?

First, because of separation of concerns: we might be interested in logging the value of some symbols outside of the context of a function, in which case we can use the function with a custom prefix. It also helps testing as we just saw.

Second, to separate things that evolve at a different speed. The function name will not change over the course of a program, whereas the values of the arguments we pass to it will. Separating them allows us to leverage this difference, such as precomputing the prefix at compile time (which we will do in our next post).

 

An API for logging (C++)


In this section, we will implement in C++ the same logging API we did for Clojure. To do so, we will have to leverage some meta-programming constructs that were not needed for Clojure.

 

C++ implementation

Let us get the easy stuff out of the way, the computation of the prefix:

std::string log_prefix(std::string const& fct_name)
{
return "[TRACE] Call to \""s + fct_name + "\" with arguments: "s;
}
view raw log_prefix.cpp hosted with ❤ by GitHub

 
Now, logging the arguments will require a bit more work than in Clojure. In dynamically typed language, we transparently deal with heterogenous containers. We can put anything inside a Clojure vector, like argument values of different types.

In statically typed languages however, we often have to resort to meta-programming constructs to deal with these heterogeneous sequences. This is trickier than using STL standard containers, so we will do it step by step.

We will start by trying to log a list of heterogenous arguments (we forget the names of the arguments for now). We make use of variadic templates:

template <typename ...Args>
void log_trace(std::ostream& out, std::string const& prefix, Args const& ...args)
{
out << prefix << "(";
// Something goes here, but what?
out << ")\n";
}

Now, we have to complete the function to stream the argument values in out. We will explore different ways to do so.

 

C++17 fold expression

C++17 introduced fold expressions. Let us use them to stream our variadic arguments args into the out std::ostream:

template <typename ...Args>
void log_trace(std::ostream& out, std::string const& prefix, Args const& ...args)
{
out << prefix << "(";
(out << ... << args); //Problem: all logs are mushed together!
out << ")\n";
}

This code does stream all the arguments. But all the arguments are logged next to each others, with no comma or space between the element, no clear separation:

//Calling fold expression implementation
log_trace(std::cout, "Prefix: ", 4, "fizz", "buzz");
//Output:
"Prefix: (4fizzbuzz)"

The introduction of fold expressions generated a bit of buzz. They look awesome at first, but are pretty inflexible, with a a fixed and limited set of operators. There is no easy way for us to add the desired comma separator. Let us try a better alternative.

 

Say Hello to Boost.Hana

Boost Hana is a truly beautiful and surprisingly simple to use API offering a value-based approach to meta-programming. If you have not watched it yet, have a look at this CppCon talk from the creator of the library, Louis Dionne.

Boost Hana has very good support for programming with heterogeneous containers, precisely our use case, by providing custom heterogenous containers and a collection of algorithm we can use to operate on these containers.

Using it, we can implement our first version of log_trace, that stream the arguments we pass it, separated with commas:

template <typename ...Args>
void log_trace(std::ostream& out,
std::string const& prefix,
Args const& ...args)
{
out << prefix << "(";
hana::for_each(
hana::intersperse(hana::make_tuple(args...), ", "),
[&out](auto const& arg) { out << arg; });
out << ")\n";
}
  • make_tuple creates the enhanced std::tuple of Boost.Hana
  • intersperse inserts a value between each pairs of elements
  • for_each is the equivalent of STL for_each on heterogenous containers

We can check that this simple code yields the result we expect:

log_trace(std::cout, "Prefix: ", 4, "fizz", "buzz");
//Output:
"Prefix: (4, fizz, buzz)"

As we can see, Boost.Hana allows to write meta-programming code more as we would write normal code. This is important for accessibility and Boost.Hana is clearly a great step in the right direction.

 

Adding the argument names

Using Boost.Hana, we can complete our implementation to include the argument names in the logs we produce.

template <typename ...Names, typename ...Args>
void log_named_trace(
std::ostream& out,
std::string const& prefix,
hana::tuple<Names...> const& arg_names,
Args const& ...args)
{
out << prefix << "{";
out << hana::intersperse(zip_args(arg_names, hana::make_tuple(args...)), ", ");
out << "}\n";
}

We use zip_args to create pairs from the arguments name and values, inserting a space between them:

template <typename ...Names, typename ...Args>
auto zip_args(
hana::tuple<Names...> const& arg_names,
hana::tuple<Args...> const& arg_values)
{
return hana::zip_with(
[](auto const& name, auto const& val) {
return hana::make_tuple(std::string(name) + " ", val); },
arg_names, arg_values);
}
view raw zip_args.cpp hosted with ❤ by GitHub

This implementation is greatly simplified by the overload of operator<< that allows to indifferently log strings, pairs of arguments, or a tuple of these:

template <typename ...Args>
std::ostream& operator<< (std::ostream& out, hana::tuple<Args...> const& t)
{
hana::for_each(t, [&out](auto const& e) { out << e; });
}

We can have a look at the result, which are quite convincing but not perfect. We would like quotes to appear around the strings “fizz” and “buzz”:

log_named_trace(std::cout, "Prefix: ",
hana::make_tuple("n", "s", "t"),
4, "fizz", "buzz");
//Output:
"Prefix: {n 4, s fizz, t buzz}"

We could improve this implementation to fix the issue with strings with a bit more efforts, but this will do for now.

 

Manual logging is error prone


Now that our logging API is done, we can use it to traces the calls to our functions, along with the arguments with which they were called.

 

C++ example

We will go back to our initial example, tracing the calls of the function repeat_n and the associated argument values:

std::string repeat_n(int n, std::string const& s)
{
std::string out;
out.reserve(s.size() * n);
for (int i = 0; i < n; ++i)
out.append(s);
return out;
}
view raw repeat_n.cpp hosted with ❤ by GitHub

We can either enrich the function to log its inputs, or we can create a wrapper function repeat_n_with_logs that will handle it. The second solution is preferable here to better separate concerns and avoids adding side effect in repeat_n. In real use cases, logs are often added to function that already do side effects, so this is less of an argument.

We demonstrate how to use our log API using these wrapper function, with two alternatives: one that logs only the argument values (positional) and one that logs the argument names and value.

//Alternative WITHOUT the argument names
std::string repeat_n_with_logs(int n, std::string const& s)
{
log_trace(get_logger(), log_prefix("repeat_n"), n, s);
return repeat_n_(n, s);
}
//Alternative WITH the argument names
std::string repeat_n_with_logs(int n, std::string const& s)
{
log_named_trace(get_logger(), log_prefix("repeat_n"),
hana::make_tuple("n", "s"), n, s);
return repeat_n_(n, s);
}

Below are the traces that would appear upon calling the function repeat_n with the arguments 3 and “toto”, for both alternatives:

repeat_n_with_logs(3, "toto");
//Alternative WITHOUT the argument names:
"[TRACE] Call to \"repeat_n\" with arguments: (3, toto)"
//Alternative WITH the argument names:
"[TRACE] Call to \"repeat_n\" with arguments: {n 3, s toto}"

 

Manual calls can (and will) fail

It does the job, but the least we can say is that it is not elegant, very repetitive, and therefore pretty error prone as well. There are many different ways to mess up this simple call to log_named_trace:

  • Upon renaming, forgetting to align the names appears in the logs
  • Upon adding a new argument, forgetting to add it in the logs
  • Messing the ordering of the arguments and their associated names (*)

The Clojure version is a bit shorter (and thus probably a bit easier to maintain), but otherwise has the exact same drawbacks:

(defn repeat-n
[n s]
(log-symbol-values (function-log-prefix fct-name) '[n s] [n s])
(apply str (repeat n s)))

We need a way to automate the calls to these logging functions, if we want to produce reliable traces. The goal of this automation is both to reduce the amount of code to write, but more importantly to reduce the number of errors we can make along the way.

(*) We could mitigate this problem by taking a variadic number of pairs. This would make sure the argument names count is equal to the argument count. It would still not remove all potential errors. You are encouraged to try this alternate design.

 

Conclusion and what’s next


In this post, we started the implementation of an automatic log facility to trace function calls and the values of their arguments. We described our requirements, and built a first basic API to log a bunch of variables and their respective values.

Through this, we saw how meta-programming and Boost.Hana could be used to deal with heterogeneous containers. Use cases for heterogeneous containers tend to occur pretty often in statically typed languages, when trying to build generic features.

We also demonstrated how supporting (at least a limited form of) reflection was useful to deal with our use case. The current lack of support for reflection in C++ forced us to use strings as argument names, where languages such as Clojure can directly manipulate the symbols they are made of (and have dedicated types for symbols).

In the next post, we will use meta-programming to automate the calls to the log API we built in this post. We will see how we can work-around the lack of support for reflection in C++ by using C-macros, and how reflection and AST manipulation make the work easier in Clojure.


You can follow me on Twitter at @quduval.

Leave a comment

Create a website or blog at WordPress.com

Up ↑