Transforming data structures into types: an introduction to dependent typing and its benefits

The 1.0.0 of Idris has been released just a few months back, just enough to start trying out the language and some of the possibilities dependent typing offers.

Back in July, we implemented a type safe bowling kata in which we could not create a bowling game that would not satisfy the rules of the game. This was a show-off of the strong typing capabilities of dependent types. It was not however motivated by a real use case.

Today’s post is not a extensive show-off of the capabilities of Idris. Instead, it is inspired from a real and recent use case, in which I wish I had a dependently typed language to support me. Through a practical use case, we will discover that dependent typing:

Is not really more complex to use despite its sophistication
Avoids some of the nasty headaches with less advanced type systems
Limits the risk of rewrites when going the strong typing road

Disclaimer: The post features Idris. This is an introduction post. As such, it is written so that the knowledge of this language is not a prerequisite, but it helps if you know a bit about Haskell-like syntax.

The (adapted) test case

Let us say that we have a list of event to celebrate, and we want to build a small program to automatically publish some message when one of these events occurs.

What’s an event

Each event has a title such as “Birthday”. This title is a short name for the event, which allows to categorize it and identify it more easily. It is represented as a simple string.

Each event also has an associated message specification. This is the template of the announcement of the event. It can be viewed as a string with placeholders for missing values. For instance, the message specification “Happy Birthday: {int} years old, {string}” contains two placeholders.

The placeholders each have an associated type such as integer or string. Only a value of the corresponding type should be interpolated at this position. This avoids emitting strange messages such as “Happy Birthday: John years old, 37” instead of “Happy Birthday: 37 years old, John”

Our requirements

Our goal is to support the following needs:

Pretty printing an event (for documentation) with its title and message specification
Storing our events in a list to scan and automate tasks such as pretty printing
Parsing the message specification as a string, such as “Happy {int} birthday {string}”
Interpolating values inside the message specification to produce a valid announcement message

We would also like to support these needs in a type-safe way: it should not be possible to interpolate values of the wrong types inside a message specification. And if possible, we would like this to fail at compile time.

Note: Without going into much details, the original use case is about constraining the schema of a JSON-like data structure, based on an identifier associated to this particular structure.

First, catching errors at runtime

Let us start simple. We will focus on having a simple API which will catch the interpolation type errors inside the message specification at runtime. The next sections will deal with moving these checks at compile time.

Message specification

We start by defining some types that will allow us to describe a message specification. We will call this type Schema. It consists in a list of schema elements which each represents part of the message.

Each schema element is either some known text, or a placeholder for an integer or string value. For instance, here is how we could decompose the message specification “{int} years old, {string}”:

Placeholder        Placeholder
     |                  |
     v                  v
  "{int} years old, {string}"
        ^^^^^^^^^^^^
         Known text

Translated in Idris, a Schema is an alias for a list of SchemaElem which can be either:

IntVar: a placeholder for an integer value
StrVar: a placeholder for a string value
StrCst: a wrapper around a known string

	data SchemaElem
	= IntVar
	\| StrVar
	\| StrCst String

	Schema : Type
	Schema = List SchemaElem

view raw SchemaDefinition.idr hosted with ❤ by GitHub

To create a message specification, we just build a list of schema elements. Here is the code that corresponds to the message specification “”{int} years old, {string}”:

	-- Equivalent of: "{int} years old, {string}"
	[IntVar, StrCst " years old, ", StrVar]

view raw SchemaExample.idr hosted with ❤ by GitHub

Since it is much more convenient to describe the message specification as text, we also provide a function parseMessageSpec to parse the textual description into Schema value:

	> parseMessageSpec "{int} years old, {string}"
	[IntVar, StrCst " years old, ", StrVar]

view raw parseSchema.idr hosted with ❤ by GitHub

We now have a way to define a message specification quite succinctly.

Event definition

An event contains both a title and a message specification. This is easily translated into an Idris record, which is analog to a data structure with automatically defined accessors on it:

	record Event where
	constructor MkEvent
	title : String
	messageSpec : Schema

view raw Event.idr hosted with ❤ by GitHub

Instantiating an event is pretty simple. We just use the constructor MkEvent followed by a string that describe the title, and a message specification. Here are two examples of such events:

The birthDay message specification is described by writing the Schema manually
The newYearEve message specification is parsed from a string

	birthDay : Event -- Declaration
	birthDay = MkEvent "Happy BirthDay" [IntVar, StrCst " years old, ", StrVar] -- Definition

	newYearEve : Event
	newYearEve = MkEvent "New Year's Eve:" (parseSchema "Happy new year {int}")

view raw EventExample.idr hosted with ❤ by GitHub

Finally, accessing the members of an event instance is done by calling the accessors functions on it:

	> title birthDay
	"Happy BirthDay"

	> messageSpec birthDay
	[IntVar, StrCst " years old, ", StrVar]

view raw EventExample2.idr hosted with ❤ by GitHub

Pretty printing an event

Our requirements include the need to pretty print the definition of an event. This means printing the title, followed by pretty printing the message specification.

We will use the Show interface of Idris (an interface to specify show the equivalent of Java’s toString) and provide two implementations for both a schema element and an event:

	implementation Show SchemaElem where
	show IntVar = "{int}"
	show StrVar = "{string}"
	show (StrCst s) = s

	implementation Show Event where
	show e = title e ++ ": " ++ concatMap show (messageSpec e)

view raw ShowEvent.idr hosted with ❤ by GitHub

We make use of pattern matching on SchemaElem to distinguish placeholder from know text
We use concatMap to transform each schema element into a string and concatenate them

Going back to our birthDay event definition, we can now call show on it to pretty print it:

	birthDay : Event
	birthDay = MkEvent "Happy BirthDay" [IntVar, StrCst " years old, ", StrVar]

	-- In the REPL:
	> show birthDay
	"Happy BirthDay: {int} years old, {string}" -- Output

view raw PrettyPrintEvent.idr hosted with ❤ by GitHub

Interpolation of values inside the message specification

A message specification contains placeholder for integer and string values. To emit an announcement message, we first need to interpolate integer and string values in place of the placeholders.

We will start by defining a type to represent values that can be used to fill a placeholder, which we will name PlaceholderValue. It contains either an integer or a string:

	data PlaceholderValue
	= IntVal Int
	\| StrVal String

view raw PlaceholderValue.idr hosted with ❤ by GitHub

To interpolate these placeholder values inside the message specification, the logEvent event function will take as input an event (containing a message specification) as well as a list of placeholder values.

It will try to match the placeholders in the order in which they appear in the message specification, with the placeholders values in the order in which they appear in the list.

"Happy birthday: {int} years old, {string}"
                   ^                 ^
                   |                 |
                   v                 v
              [IntVal 32, StrVal "Quentin"]

Since this process might fail (due to arguments of the wrong type, or missing arguments), logEvent will return an optional value (Maybe in Idris) to signal it.

	-- Takes an Event + a List of PlaceholderValue as argument
	-- Returns an optional String as a result
	logEvent : Event -> List PlaceholderValue -> Maybe String
	logEvent = ?missingImplementation

view raw SchemaValue.idr hosted with ❤ by GitHub

The implementation of this function is not that important (it is mainly pattern matching the placeholder expected type against the placeholder value) but provided here as reference.

Examples of message specification interpolation

Below are some examples of using our logEvent function:

	birthDay : Event
	birthDay = MkEvent "Happy BirthDay" [IntVar, StrCst " years old, ", StrVar]

	-- In the REPL:

	> logEvent birthDay [IntVal 32, StrVal "Quentin"] -- Values match the placeholder expected types
	Just "Happy BirthDay: 32 years old, Quentin"

	> logEvent birthDay [StrVal "Quentin", IntVal 32] -- Wrong placeholder types (inverted)
	Nothing

	> logEvent birthDay [IntVal 32] -- Missing placeholder value
	Nothing

view raw SchemaValueExample.idr hosted with ❤ by GitHub

It is safe because wrong types for placeholder will lead to Nothing being returned as an answer (to indicate a failure) at runtime. So we cannot construct and therefore emit bad messages.

But we could argue that these type and missing values errors are programming errors, not user input errors, and could therefore be moved at compile time. We will see how to do this in the next section.

Evaluating the costs

Before we proceed into making our API type-safe in such as way that interpolation errors are caught at compile time, it is worth to consider the cost of doing so.

The cost of implementation

Our needs are pretty simple. A non-technical person would expect it to be done quite quickly. Yet, and depending in the language we use, making our API type-safe might be quite expensive.

There is indeed a hidden challenge here. Ensuring type safety, while still allowing message specifications to be parsed from a string, is no easy programming task.

Take C++ for instance. Type-safety would go through hoisting our message specification to the type level. Parsing the message specification will therefore have to be done at compile-time too. This would require using either template meta programming or constexpr functions.

This can be done. Ben Deane and Jason Turner showed how to parse JSON at compile time in there “constexpr ALL the things!” talk. But it is not for the feint of heart, and certainly not without a cost.

The question then becomes whether it is worth it. It is not if the cost of developing a fully type-safe implementation is higher than the cost of dealing with or correcting the errors.

The cost of change

We also need to evaluate the cost of migrating our previously developed API into this new compile-time type safe API. Think about the refactoring needed to accommodate a single new requirement, type safety. Think about the potential cost of migrating potential existing users.

Can we even keep a single function from the previous API?

Now, it turns out that in Idris, we do not need to change anything of our previous API. To make it fully type safe, we only have to add two new functions. And the use of the word add instead of change is important here: we do not need to break anything.

Now, with compile time checks

In our first design, interpolation errors could only be caught at runtime. Let us see how we catch them at compile time instead, without breaking our previous code.

Overall design

We want a way to interpolate placeholder values inside a message specification such that any missing or invalid placeholder values is caught at compile time.

To get an idea of where we want to go, let us write some client code of the API first. Here is one possible such client code, where logEvent is a function which:

Takes as first argument the event to be announced
Takes as subsequent arguments its placeholder values

	birthDay : Event
	birthDay = MkEvent "Happy BirthDay" [IntVar, StrCst " years old, ", StrVar]

	-- In the REPL:
	> logEvent birthDay 32 "Quentin"
	"Happy BirthDay: 32 years old, Quentin"

view raw TypeSafeGoal2.idr hosted with ❤ by GitHub

Now, it is important to notice that the number and the types of the arguments of this logEvent function depend on the value (not the type) of the first argument. A perfect match for dependent types.

How we will proceed

Thanks to currying, we know that a function that takes N arguments can be seen as a function that takes one argument, and returns a function which expects N-1 arguments.

So we can see the logEvent as a function which takes an event and returns a new function, which we will name f here. This function f expects as many arguments as there are placeholders in the event, each with the appropriate type. It will return the resulting announcement message as a string.

To illustrate this, we can use :t (in the REPL) to query the type of the function f returned by the partial application of logEvent on an event:

	birthDay : Event
	birthDay = MkEvent "Happy BirthDay" [IntVar, StrCst " years old, ", StrVar]

	-- In the REPL:
	> :t logEvent birthDay -- Query the type of logEvent applied to the birthDay event
	Int -> String -> String -- The type of a function expecting an integer and a string to produce a string

view raw TypeSafeGoal.idr hosted with ❤ by GitHub

Writing logEvent consists in finding how to write this function f. We will proceed in two phases. We will first see how to compute the type of this function f, provided a message specification (ultimately the one of the event). Then we will complete the implementation.

Computing types

So how do we compute the type of this mysterious f function? And what does it mean to compute a type? In Idris, it means nothing in particular. A perfectly normal function can take a type as input, or return a type as output. Computing a type is calling a function which returns a type.

We will now write a function SchemaType to compute the type of f from the schema of a message specification. We recall that a Schema is a list of schema element, where each element is either a known text fragment or a placeholder:

	data SchemaElem
	= IntVar
	\| StrVar
	\| StrCst String

	Schema : Type
	Schema = List SchemaElem

view raw SchemaDefinition.idr hosted with ❤ by GitHub

We can use recursion on the list of schema element to slowly build the type of a function. At each recursive call, we will pattern match on the list, until we reach the empty list:

	-- Function which takes a Schema and returns a function Type
	SchemaType : Schema -> Type
	SchemaType [] = String
	SchemaType (IntVar :: xs) = Int -> SchemaType xs
	SchemaType (StrVar :: xs) = String -> SchemaType xs
	SchemaType (_ :: xs) = SchemaType xs

view raw SchemaType.idr hosted with ❤ by GitHub

If the list starts with an IntVar placeholder, we recur and add a Int argument in first position
If the list starts with an StrVar placeholder, we recur and add a String argument in first position
If the list starts with known text, we recur, not adding any new argument

We can try our SchemaType function in the REPL to convince ourselves that it works:

	> SchemaType [IntVar, StrCst " years old, ", StrVar]
	Int -> String -> String

view raw SchemaTypeTest.idr hosted with ❤ by GitHub

The type of Log Event

Until now, we carefully avoided to define the type signature of the logEvent function. Now, that we defined SchemaType, we can finally do it.

To do so, we extract the message specification from the event given as first argument of logEvent, and we call SchemaType on it, inside the type signature of logEvent:

	logEvent : (event: Event) -> SchemaType (messageSpec event)
	logEvent e = ?unspecified

view raw logEventPrototype.idr hosted with ❤ by GitHub

There are important characteristics of Idris to notice here:

Idris allows to refer to other arguments’ value by name in any type signature
We can call functions such as messageSpec in the type signature of a function
We can call functions such as SchemaType to compute part of the type signature

In our specific case, the return type of logEvent refers to the value of its first argument, event. We then extract the message specification out of it, in order to compute the rest of the type of the function.

Now, if we ask for the type of the result of this function, we get what we expect:

	> :t logEvent birthDay
	Int -> String -> String -- A function that expects an integer and a string to produce the message

view raw logEventBirthDay.idr hosted with ❤ by GitHub

Finishing the implementation

Now that we have the type signature of logEvent, it is only a small matter of programming to complete the implementation of the function. We just use the same trick we already used: recursion.

Each recursion level looks at a schema element, and builds a lambda which takes one placeholder value (the syntax \argument => body of the lambda). Inside the lambda, we simply concatenate the result of serializing the placeholder value with the rest of the message.

	logEvent : (event: Event) -> SchemaType (messageSpec event)
	logEvent e = loop (messageSpec e) (title e ++ ": ")
	where
	loop : (messageSpec: Schema) -> String -> SchemaType messageSpec
	loop [] out = out
	loop (IntVar :: rest) out = \i => loop rest (out ++ show i)
	loop (StrVar :: rest) out = \s => loop rest (out ++ s)
	loop (StrCst s :: rest) out = loop rest (out ++ s)

view raw TypeSafeLogEvent.idr hosted with ❤ by GitHub

We can try it and check that it works fine when correctly used:

	> logEvent birthDay 32 "Quentin"
	"Happy BirthDay: 32 years old, Quentin"

	> logEvent newYearEve 2017
	"New Year's Eve: Happy new year 2017"

view raw LogEventSuccess.idr hosted with ❤ by GitHub

And we can check that it fails to compile when incorrectly used:

	> logEvent birthDay "Quentin" 32
	".. Fails to compile ..."

	> logEvent birthDay 32 32
	".. Fails to compile ..."

view raw LogEventFailure.idr hosted with ❤ by GitHub

We know have a fully type-safe interpolation of values inside the message specification. And we did not break our previous implementation. In fact, both versions can co-exist.

Conclusion

In today’s post we showed some important feature of Idris type system, and of dependent typing in general: being able to compute types based on values.

We applied it to a use case, inspired from real a real one. We showed that it allows to build a stronger type-safety than what is allowed with more traditional type systems.

In addition of stronger type-safety, dependent types also fill the gap between values and types. They allow us to move between runtime and compile time checks in a smooth way. This helps removing some of the costs of using strong types in statically typed languages.

In particular, the risk of having to rewrite entirely an API is reduced substantially (as well as the risk to adapt or migrate its client code) upon changes in the requirements.

Follow @quduval