The 1.0.0 of Idris has been released just a few months back. In previous posts, we went over 10 differences between Haskell and Idris and illustrated the dependent typing features of Idris through the implementation of the rules of the Bowling game at the type level.
We will continue this series on post on Idris, by implementing a small transducer library (the Idris package is available in this GitHub repo.).
The goal will be to provide Idris with a library efficient composable algorithmic transformations. This is something we quickly feel the need to when playing with Idris, due to the strict (non-lazy) nature of the language.
Motivation for transducers in Idris
Let us first start by discussing why transducers are an interesting library to port in Idris.
Lazy vs Strict
As mentioned in the post listing 10 differences between Haskell and Idris, Idris is strict by default, while Haskell is lazy by default. Whether or not laziness is a good default is a common subject of discussion in the Haskell community.
On the bad side, laziness is often complained about for its tendency to introduce non obvious memory leaks or other performance defects in Haskell code. On the good side, laziness allows to separate processes that produce data, from the one transforming or consuming it (*).
(*) John Hughes writes in lot more details about the benefits of laziness in terms of software modularity in its great paper Why functional programming matters.
Thanks to laziness, we can easily compose map, filter and folds together in Haskell and not worry to much about intermediary list creation. But this is not the case in Idris.
The following two lines of Idris code will not execute at the same speed: map (+1) will need to first realize the whole intermediary list before take extracts only the 10 first elements:
In Idris, we need to implement ways to emulate laziness in order to get back the efficient composition of algorithm which is native in Haskell.
Transducers are one way to implement efficient composition of algorithms. Transducers comes from the Clojure world and have been built with efficiency and reuse in mind:
- They eliminate intermediary sequence or containers when composing algorithms
- They decouple the transformation from the data source and destination
This makes transducers a great tool to build generic pipe-lines of algorithmic transformation, reusable in quite different contexts, and which can be composed very efficiently.
Step functions & Reducing functions
We will start with some definitions and define their associated types. The vocabulary is directly inspired from Clojure, with some minor twists. Defining this vocabulary will guide us through the main concepts of pipelines of data transformation.
A stateless step function is a function whose purpose is to be used in a left fold (also known as reduce in Clojure or std::accumulate in C++).
It combines an element coming from a data source (a container for instance) with some intermediary result, to compute the next intermediary result. Put differently, a step function represents a task to be performed in one iteration of a pure loop (without states and side-effects).
- We call accumulator (or acc) the result being computed
- We call element (or elem) an element of the data source to read
We can define a type alias that will help us formalise this concept in the code:
For instance, the type alias StatelessStep Int String represents a step function that consumes strings to produce a result of type integer. It could be a function that sums the length of strings (in a fold):
A stateful step function is a step function that needs some additional state to do its job properly. For instance, splitting a collection into chunks of equal size requires some kind of state: elements have to be kept in store until the completion of a chunk.
Because we are in a pure language, we will avoid relying on side-effects to track the state of our step function. Instead, we will rely on the State Monad:
For instance, the type alias Step Bool Int String represents a step function that consumes strings to produce a result of type integer, and maintains a state of type boolean. It could be a function that sums the length of strings, ignoring one of every two strings it encounters:
Adding early termination
We will also want to express early termination for algorithms such as take that do not need to consume the whole data source. To do so, we will enhance our stateful step function to return a result decorated by a status:
- The step can return Done to indicate early termination of the loop
- The step can otherwise return Continue to proceed with the rest of the loop
For instance, the type alias Step () Int String represents a step function that consumes strings to produce a result of type integer, and maintains no state. It could be a function that sums the length of strings, stopping when the sum exceeds a given value:
Note: this ability to return early combines especially well with state (for instance to build functions such as take or drop). But we just saw that it also makes sense without state.
Reducer (or reducing function)
A reducer (or reducing function) is what we get when we bundle a stateful step function with its state. The following Idris code creates a type Reducer that contains a field state and two functions runStep and complete:
The runStep function is doing the heavy lifting. It takes as input the current state, the current value of the accumulator, and an element. It returns the a new accumulator and the state to use for the next iteration. It represents the content of a loop.
The complete function is there to deal with the remaining piece of state. It represent the termination of the loop, the last thing to perform once the source of data is consumed. It allows to discharge remaining pieces of state at termination.
Example of reducer
Going back to our previous example, we can package our sumLengthOfEveryOddStrings step function (which sums the length of strings, ignoring one of every two strings it receives as input) with its state (a boolean).
Having packaged the stateful step with its state, we get a reducer that is self sufficient. It can be plugged into reduce to execute it on an input collection:
We have not introduced the reduce function yet, but you can see it as a generalised fold which manage both state and early termination.
A transducer is a transformation of reducers. It takes as input a reducer, and returns another reducer. The new reducer includes additional functionality. In the process of adding new functionality, a transducer can both:
- Change the type of the state (s1 to s2), usually to track some more state
- Change the type of the element (elem1 to elem2) being accumulated
The type of the accumulator, on the other hand, cannot be modified (the result we expect from a loop is fixed). This leads to the following type alias for transducers:
For instance, the type alias Transducer Int () Bool String Char represents a transducer that transforms:
- A stateless reducer accumulating strings into a result of type integer
- To a reducer accumulating characters into result of type an integer
- And adding a boolean state in the output reducer
Because transducers are just functions from one reducer to another reducer, they can be composed together just like normal function can. Composing transducers lets us:
- Gradually add features to pipe-line of data transformation
- Avoid realizing intermediary containers: we only compose recipes
For instance, we can compose a transducer than adds a filtering behaviour on a reducer to a transducer that maps each element to its square:
Because of this, we can see transducers as pipe-lines, each each step is responsible for its own transformation, and forwards resulting elements to the next transducer in the line.
Note: You may have noticed that the type alias for transducer had elem1 and elem2 reversed. This is because transducers compose from right to left (the direction of composition), but the data flows from left to right (the direction of pipeline).
Running the loop
Reducing functions provides us with recipes to consume a stream of value and summarize it as a single result. Transducers allows us to build such recipes out of smaller recipes. But to make these recipes useful, we need to way to execute them.
In this section, we will dive into the implementation of reduce and transduce, two function that will allow us to execute the recipe on some data. You can skip this section if you are not interested in the implementation, and jump to the next one for example of transducers.
The library provides a function named reduce that we used already in this post. It executes a provided reducing function (a recipe), given:
- An initial value for the accumulator
- A data source we can fold over to retrieve a stream of values
It will consume elements of the data source, executing the recipe at each iteration, until the stream is totally consumed or until the recipe asks for early termination. Then it will call the completion function on the result before returning it.
This implementation relies on runSteps to run the loops until either the stream of values is totally consumed, or the reducing function asks for early termination (returning Done).
Running the steps
The implementation of runSteps relies on Continuation Passing Style to support the early termination.
- foldr builds a chain of functions, one for each element of the source
- Each of these functions represents one iteration of the loop
- Each iteration and is provided with the next one: nextIteration
- If nextIteration is not called, the next iteration is not run and so the loop stops
The library provides a second function named transduce that is only a thin but useful layer above reduce. It reduces (pun intended) verbosity for the most common use cases.
Indeed, a lot of pipe-lines of data transformation ends up with a stateless step function (like a sum, a concatenation, etc.). In such cases, using transduce instead of reduce removes a bit of noise. Here is an example in which we sum the square of odd numbers:
One other great advantage is that it allows to use the word transduce in our code, making your Idris code almost as cool as Clojure code.
Building our own transducer
It is time to build our very own transducers. We will keep it simple in this post, and focus on stateless transducers. We will also use a helper function statelessTransducer (available in the library) to abstract away some of the details of the construction of a transducer.
The goal is to build an intuition for those who are foreign to the concept. The next post will deal with stateful transducer and unveil the details behind the helper functions.
Mapping consists in using a function from a to b to transform an input reducer operating on element of type b to a reducer operating on elements of type a. It takes elements of type a and sends elements of type b to the next element of the pipe-line.
In the code below, next represents the step function of the next transducers in the pipe-line, while fn represents the mapping function from a to b.
- By applying the function fn on an element of type a, we get an element of type b
- We can give this element to next, which consumes elements of type b
Here is an example in which we sum the length of a bunch of strings, multiplying each length by 2 along the way:
Note: Thanks to transducers and ability to compose in terms of recipe, we get the following equivalence by construction: mapping f . mapping g = mapping (g . f). Note that f and g are inverted compared to the usual Functor law because of the left to right flow of data.
Filtering consists in transforming an input reducer operating on element of type a to a reducer operating on only those elements a that satisfy the a given predicate.
In the code below, next represents the step function of the next transducers in the pipe-line, while pf represents the predicate function from a to Bool.
By calling next on only these elements that satisfy the predicate, we effectively filter these elements from the input source data: the rest of the pipeline will not see them.
Here is an example in which we sum the length of strings that do not start with the letter ‘a’:
Concat mapping is just like mapping, to the exception that the function provided to the catMapping function may return several b for one a.
In the code below, next represents the step function of the next transducers in the pipe-line, while fn represents the mapping function from a to a collection of b.
By calling next on all the the elements output by fn, catMapping takes elements of type a from and the left, and transmits elements of type b to pipe-line on the right. The main difference with mapping is that the number of bs is not necessarily the same as the number of as.
Here is an example in which we sum the length of strings, counting each of them twice:
Conclusion, and what’s next
In this post, we went over basics to define transducers in Idris. We defined the concepts and our main types, detailed the main function reduce and transduce, and built very simple transducers.
In the next post, we will see how to define more interesting transducers, such as take, takeWhile, interspersing or groupingBy. We will also look at some helper functions that helps building these transducers more easily.
The library is available in this GitHub repository. Any suggestions for improvements is obviously welcomed.