10 things Idris improved over Haskell

The 1.0.0 of Idris has been released just a few months back, just enough to start trying out the language and some of the possibilities dependent typing offers.

But this post is not about dependent typing. There is already a really good book that came out this year, named Type Driven Development with Idris, exploring the benefits (and trade-offs) of dependent typing. I highly recommend you to read it.

Instead, this post describes some of the pleasant surprises you get trying out Idris, coming from the Haskell world. These pleasant surprises have nothing to do with the dependent typing features. They are simple yet impacting modifications, which improve the developer experience substantially.

I listed my top 10 in the hope they will convince you to give a try at Idris.

 

1) Idris Strings are not lists


It does not seem like much, but Haskell Strings are such a pain to deal with, that this improvement got the first place.

Haskell String are lists of characters. This big mistake in terms of efficiency is corrected by two very good packages: Data.Text and Data.ByteString. Nevertheless, this is still a source of pain and accidental complexity:

  • Newcomers to Haskell almost always face the inefficiency of Strings
  • Strings are not first class, and forces use the Overloaded Strings extension
  • The code gets obscured by successive by pack and unpack calls

Idris learned from the mistakes of Haskell and made Strings first class. You can call pack and unpack strings to go back an forth to a List Char representation. This is illustrated by this simplistic implementation of the caesar cipher algorithm:

We can test this function in the REPL:

 

2) Overloading without Type Classes


Overloading of function has also been a subject of numerous complains in the Haskell community. Of course, we can workaround the issue by using Type Classes, but this has some drawbacks as well.

Idris learned from these complains as well, and offers a kind of overloading. You can import the same name from different modules and Idris will not care, as long as it can deduce which one to use.

Even better, Idris introduce the notion of namespace, that we can use inside a module to introduce overloads. For instance, we can create a plus operator that works for both scalars and lists in the same module.

As long as Idris can deduce which overload to use, the developer does not have to prefix the symbol with the namespace:

This is a very nice improvement over Haskell rules that forbid any kind of overloading, and has nice benefits on the code:

  • It improves composability of software (by limiting conflicts of names)
  • It avoids using awkward operators to avoid ones used by other libraries
  • It remains safe and clear, by supporting explicit namespace prefix

 

3) Records fields are namespaced


The notion of namespace of Idris really shines when it comes to records. The fields of a record live in the namespace of their respective record. The net effect is that they do not collide with the names of the other records as they do in Haskell.

So we can have several records with the same field name. For instance, we define below an Account and Customer record, each having an address field:

As for the overloading rules and namespace, Idris will try to infer which function is being referred to when using the name address. Most often, the developer does not have to provide any hint.

Here is an example of function that uses both the customer address and account address to check if they both match:

This type-checks correctly. In case of ambiguous calls, you can always add the namespace as prefix to help Idris figuring your intent:

 

4) Records update and access syntax


Updating values in nested record does come at the cost of verbosity in Haskell. There are packages that help navigating data structure, such as lenses, but a lot of Haskell users recognise it as a pretty big dependency.

Idris offers a lighter update syntax for the fields of nested records. We can update the address of the account of a customer that way:

Idris also offers the possibility to apply a function over a field in a nested records. For instance, we can complete (by concatenation) the address of the account of the customer with $=:

We can try both our previous function in the REPL to convince ourselves that they work properly (and they do):

This does not replace the need for lenses (which are much more general). But it is nice to count of some basic support in core Idris when we do not need that much power.

 

5) Monad and Functor got fixed


The Haskell Monad return is absent of Idris, and so is the fail method, which most Haskell developers recognise as being a design mistake.

In Idris, Monad extends the Applicative (as it does now in Haskell, but did not for quite a long time) and so pure makes sure that return is not needed. Since the return naming is conflicting with the return statement of many programming languages, this is one less distinction to explain to newcomers to Idris when compared to Haskell.

You can also notice the presence of join in the interface of the Monad, allowing to define whichever of bind or join is more intuitive depending on the Monad instance:

Looking at the Functor type class, fmap is renamed map in Idris. This also improves over Haskell, by avoiding the awkward explanation to newcomers to Haskell of why map is in fact named fmap for legacy reasons:

 

6) Laziness is opt-in, not opt-out


There are plenty of good reasons for laziness, there is no denying it. Reading the amazing Why Functional Programming matters shows how much it helps modularising and decoupling our software, by allowing the separation of processes producing data from the one consuming it.

But there are plenty of drawbacks to laziness as well. Some are technical (space leaks) and some are psychological (non strict evaluation is not intuitive when coming from other languages).

A bit like Clojure (and other FP languages) did before, Idris embraces opt-in laziness, with a strict evaluation by default. Streams are the lazy equivalent of Lists and the language features both. You can notice the different pretty easily, when playing with infinite and finite ranges:

 

7) A smaller Num class


For all of those who likes to play with DSL in Haskell, the fact that the Haskell Num class is split into smaller bits in Idris is a small but nevertheless enjoyable improvement:

The Haskell version often leads to dummy implementations of the negate, abs or signum methods. This should not occur anymore in the Idris version of the Num class.

 

8) The Cast type class


Idris removed the Read type class and replaces it with the more general Cast type class. The Read type class allowed to parse a String into a type a, effectively a transformation from a String to any other type. The Cast class is parameterized on the input type as well, allowing to transform between any two types implementing the type class.

We can read a string into an integer like follows (you can note that reading an invalid string returns a default value instead of throwing an exception):

The generalisation allows to avoid the multiplication of functions to convert between the numeric types. For instance, we can use it to convert (with losses) a double into an integer value:

 

9) Clear separation of evaluation and execution


Let us write a simple function that reads the input from the command line, interprets it as a number, double this number, before printing it on the screen:

If you evaluate a call to this function in the Idris REPL, you will get the following big pile of Abstract Syntax Tree, which correspond to the recipe to execute the IO expression described by double_it:

To execute the function and not just evaluate it, you need to say so explicitly, by using :exec as prefix to the expression to execute:

Why considering this an improvement over Haskell. If you are a Lisp adept, you can but appreciate the reification of the IO actions into a proper AST. It may just be personal taste, but I just love it.

 

10) The Do notation without monads


I kept the best part of Idris for the last. Before of its support for overloading, Idris allows the developer to define a bind operator out of the Monad class, and the do notation is based on the sole presence of this operator.

This is truly an awesome idea. It allows the developer to profit from the nice syntax of the do notation, without having to comply to the interface of the Monad bind operator.

  • It avoids forcing awkward design just to fit the Monad type class requirements
  • It opens up the notation for numerous types that could not be made proper instance of Functor, Applicative or Monad, because they are not parameterised
  • It allows to keep the do notation for dependent types, which much rarely conform to the interface of Monad

 
It also allows us to abuse even more the do notation. More power means more changes to do horrors as well. Here is an example of abuse that shows that we do not even need a parameterized type for the do notation anymore:

  • It describes a not very useful StringBuilder DSL
  • Evaluating an expression of this DSL with eval_string yields a String
  • Defining the bind operator allows us to describe our DSL expression with do

Here is an example of usage of this DSL, in which we create an StringBuilder expression named john, using our beloved and dreaded do notation:

Evaluating our StringBuilder expression shows us the AST, and calling eval_string on it interprets it as a String:

 

Conclusion


Idris offers some pleasant surprises for the Haskell programmers, even without considering the dependent typing support. There are many other pleasant surprises I got, such as the distinction between print and printLn, which is now aligned with putStr and putStrLn, and many others.

These changes mostly show how helpful it is to be able to start all over again from scratch and fixing these pesky mistakes we did at the beginning.


Lisp Meta-Programming for C++ Developers: Automatic traces (Part 2)

In the previous posts, we started a series dedicated to familiarize C++ developers to the world of LISP meta-programming, based on the Clojure programming language. The goal is to offer a different perspective on meta-programming, how we can approach it, and the kind of use cases it can address.

We started the series by an introduction on Clojure, homoiconicity and how meta-programming works in LISP dialects. You can find below our previous posts:

 
In our previous post, we started implementing a way to automatically trace function calls along with its arguments. We showed how Boost.Hana helped dealing with heterogeneous containers, and discussed the importance of supporting reflection as a language feature.

In today’s post, we will continue where we left our log API, and use meta-programming to automatically add calls to our log API in our functions. The absence of reflection in C++ will lead us to use the good old C-macros.

We will then have a look at how Clojure deals with the same problem. We will see that having access to AST manipulations as well as side-effects at compile time tremendously helps cleaning up the code and separate concerns.

Note: this post builds on the previous posts of the series. Unless you already know Clojure and its macros, you need to read the previous posts to be able to follow this one.

 

Summary of the last episode


Let us start with a quick summary of our goal and what we started to implement in the previous post of the series.

 

Our goal

Our goal is to provide a very simple way for a developer to add traces around some of its most critical function, logging each of their calls. These traces should contain the name of the function as well as the arguments names and values passed to it.

[TRACE] Call to "repeat-n" with arguments: {n 3, s "toto"}

We want to support two main ways to plug the traces: one that allows to dynamically switch traces on and off, and one that is enabled or disabled at compile time.

 

Where we are

In our previous post, we built an API to manually log traces in both Clojure and C++. This API features a function to compute the log prefix from a function name, and a function to log the arguments of a function.

We demonstrate below how we would use the Clojure API, if we add to manually traces the inputs of a function repeat-n:

The equivalent C++ code is shown below. Since C++ does not have a dedicated type for its symbols (as it does not yet support reflection), we use string for the symbol names:

 

Our next steps

As shown by the example above, adding traces manually to our code is both verbose and error prone. We want to get rid of all this risky and noisy boilerplate.

We will provide a meta-programming facade to our API to make this kind of code as concise and error free as possible. This facade will need to support the two use cases we mentioned earlier: run-time and compile-time switches of the traces.

 

Automated tracing in C++


We will start by C++. Because C++ does not (yet) support reflection, we will make use of C-macros (and their # operator) to extract the argument names.

 

First without argument names

As usual, we will start with a simpler problem. We will trace a function call, logging the function name along with its argument values (but not their names).

To do so, we will use log_trace we developed in our previous post, and our first goal is to call it as safely and concisely as possible:

We can write a higher order variadic template function wrap_with_logs that will call the log_trace function before calling the function it wraps.

It helps, but the function name needs to be provided as a string and manually:

The need to provide the function name manually is a potential source of inconsistency if the actual function name happens to to change.

 

A bit of Preprocessor in the mix

We can automatically provide the function name to the wrap_with_logs function, using a C-macro and the # operator to string-ify its function argument.

Using it, it gets easier to add logs around our repeat_n function. We merely wraps the function call with our LOGGED_CALL macro. Ugly, but it works:

This avoids the inconsistency errors we discussed earlier: the function name appearing in the logs will automatically be the correct one.

 

Adding argument names

We solved the simpler problem, so we can add our arguments names back. We will adapt our wrap_with_logs function to call log_named_trace instead.

A variadic C-macro will not do this time: separating each argument name with a comma (to build the hana::tuple) cannot be easily done. So we will offer several C-macros, one by function arity. These macros can themselves be generated by a script.

We only show the first three of them:

It is slightly less intuitive to work with: we have to care about the arity of the function. But it manages to eliminate much of the boilerplate:

Using the C-macro, there is no way to mess up with the argument names anymore. The trick is that these argument names are the ones of the wrapping function.

 

Further improvements

The wrapped function repeat_n can be hidden or not, depending on whether we want to constrain the user of the API to use the version with the traces:

Now, we are not completely done. The initial requirements mentioned the support of run-time and compile-time switch of the traces. This can be easily accommodated:

  • Run-time switches can be factored into the with_*_log functions
  • Compile-time switches can be offered through pre-processor flags

For instance, we can make the logs completely disappear from production build through the use of a preprocessor flag to modify the definition of the LOGGED_CALL C-macro. These enhancements are left to the curious reader to implement.

 

Drawbacks

We managed to deliver the feature, but we had to rely on several different mechanisms, some of which are not part of our language:

  • Template Meta Programming
  • The Boost.Hana library
  • The preprocessor (for C-macros)
  • Likely our build system to carry the preprocessor flag

This diversity has some serious drawbacks. We need to learn different things, each having their different way to work, which places a burden on the developer.

In addition, the “source of truth” is not at one single place: no place gives the whole story, especially if we have to use the build system to carry preprocessor flags. All the different places where the “truth” is spread are nevertheless strongly coupled.

 

Automated tracing in Clojure


It is time to deliver the same service in Clojure, and show how the ability to manipulate the AST directly and perform side-effect at compile time helps us avoiding the difficulties we encountered in our C++ implementation.

 

Bring your own language

You can find below the implementation of the repeat-n function, that creates a string made of n times the string s.

Direct AST manipulations in Clojure allows us to build our own language inside the language. We can create a macro that defines a function with the additional log traces built-in. Here is how we can define repeat-n with automatic traces:

The only difference is the use of the macro defn-log (which we will implement just after) instead of the defn macro used to define a function in Clojure. That’s it.

 

A first implementation

Our goal is to mimic defn so we first need to understand how it works. It is a macro, taking 3 (or more) arguments:

  • The function name
  • A vector of arguments (named bindings)
  • The implementation (all remaining arguments)

We will take the exact same arguments to define our defn-log macro. It will return the same function definition as defn would, except it will intercalate a call to our trace API at the beginning of the function (*).

The tilde allows to interpolate inside the expression the result of the compilation of the log message (which we will see in the next section). The tilde-arobase allows to interpolate a list of expressions (the whole body of the function in our case).

We can try our new macro and see how it macro-expands. An interesting thing is that we are able to precompute the prefix at compile time:

 

Log message compilation

Our defn-log macro used compile-log-message to add the tracing code and compute at compile time the prefix of the log traces. Here is its implementation:

  • It is a simple function returning an AST, not a macro (*), so we can test it
  • The returned code fragment contains the call to the trace logging API
  • It also contains the computed function prefix (interpolated through tilde)

We demonstrate its behaviour below:

If we call this function at compile time, the prefix computation will thus happens at compile time, reducing the amount of work to do at run-time.

(*) Functions can return code fragments too. Macros are only needed when we want to intercept an AST fragment before it reaches the compiler. This is very useful in practice: it allows to extract in testable function most of the heavy lifting of macros.

 

Going further: adding support for de-structuring

The previous implementation of compile-log-message does not work for function making use of de-structuring. De-structuring allows to pattern match directly on the arguments of a function.

As an example, the following function will extract the head and tail of the collection given as argument, and create a map holding the destructured parts:

Our defn-log macro will not work with this head-and-tail function. The ampersand symbol does not have any value associated to it. It is a purely syntactic construct, but our macro will capture it nevertheless, leading to a compilation error.

So we have to filter out the symbols that are only syntactic constructs. This is the task of the function bindings->bound-vars (implementation available on GitHub). Here is the corrected version of our compile-log-message, which we will work for destructuring:

Now, we can trace de-structured argument values automatically:

 

The simpler the syntax, the better

Our previous example with destructuring shows how syntax impacts AST based meta-programming. Every specific syntax construct has to be dealt with. This represents a cost that has to paid inside most macros.

There is very little separation of concern when it comes to syntactic constructs. The compiler sees everything, and so the macros (which intervene just before the compiler) see everything as well. The more complex the syntax, the harder it gets.

This is why Clojure and Lisps in general deliberately keep the core of the language as small as possible. Irregularities and specific syntax cost dear (for tooling as well). There is a value in not adding features to a language.

 

Enabling and disabling traces


We will end this post by improving our defn-log macro to add the support a switch to enable/disable traces at run-time or at compile-time.

 

Run-time switch

Adding a run-time switch to enable/disable the traces is actually pretty straightforward. We can just add an if statement to the generated code by defn-log.

As we can generate the code we wish inside the if condition, we can get fancy and even introduce some more complex logic: retrieving some configuration to chose which logger to us, and more.

 

Compile-time switch

An if statement did the job at run-time. To implement the compile-time switch, we can use an if statement as well. We merely have to execute it at compile time.

Our if condition can do anything we wish, even trigger side effects. We can for instance read a resource file (shown below) to enable or disable the traces.

Here is the implementation of log-enabled? that reads the content of this resource file and extract the :enabled? flag value from it:

The code emitted by the compiler will depends of the content of the resource file. Unless :enabled? is true, the code will not contain any trace related code.

 

Side effects in the compiler

Triggering side effects in the compiler through meta-programming is not just a technical curiosity. It delivers separation of concern between the build system (used traditionally to convey flags influencing compilation) and the source code. The source code can do its own business without the build system having to know or care.

You might have encountered plenty of examples in which the build scripts contained flags that are needed by some pre-processor macro somewhere, used to disable or enable some things at compile time (like asserts, logs, etc.).

You might have encountered plenty of examples in which the build scripts contained configuration data needed for a particular Web-Framework, a library to generate nice GUI from a XML/JSON files, in any other language (C++, Java, JS, etc.).

These pieces of configuration are usually tricky, are coupled to all the software, and contains lots of repetitions (configuration files do not have the full power of a programming language). But it does not need to be so.

Allowing side effects inside the compiler (though meta-programming) allows to escape all these issues. Having a pure functional programming language at compile time is not enough: side effects are useful at compile time as much as they are at run-time.

 

Conclusion and what’s next


In this post, we finished the implementation in both Clojure and C++ of a small library that allows to automatically trace call to particular functions, logging their argument names and values.

We showed how direct AST manipulations inside the compiler allows us to build this feature in a very simple way. We demonstrated the value of not having too much syntactic constructs inside a language. And we talked about the value of triggering side effects in meta-programming.

In the next post of the series, we will see how meta-programming can help us express more declaratively high level concerns, by declaring the how with data and delegating to a macro the generation of the corresponding code.


You can follow me on Twitter at @quduval.