Building a Clojurescript game: AI

This post is the sixth in the series of posts focused on the design and implementation of a port in ClojureScript of the game named Tribolo. You can try the game at the following address.

Our first post described the game and identified the following responsibilities, each of them will be developed in a different post:

  1. Game logic: the rules governing transitions between game states
  2. Rendering: transforming the current game state into HTML and SVG
  3. Store: the in-memory data-base holding the application state
  4. Game loop: coordinates and control of the different aspects
  5. AI logic: selection of the best next move for an AI player

Among these different parts, we already implemented the game logic as well as the rendering and the game loop. The goal of this post will be to implement a decent artificial intelligence for our Tribolo game.

 

Reminder on the target architecture


Our different responsibilities are interconnected through the following high level architecture (you can refer to our first post for more details):

   Store ----------------> Rendering ---------------> Reagent
     ^      (reaction)         ^      (send hiccup)
     |                         |
     |                         | (listen)
     |       (update)          |
     \-------------------- Game loop
                               |
                               |
     ,-------------------------| (use to update store)
     |                         |
     v                         v
 Game logic <-------------- AI logic

The last missing piece to complete the game is our AI. Until now, we assumed the existence of a function find-best-move that would implement this AI.

This function takes as input a game and returns what it believes to be the best coordinate for the current player to play at.

The goal of today is to find a satisfying implementation for this function.

 

Designing our AI


Before diving in the implementation, we will start by thinking a bit about what we want the achieve, and how we want to achieve it.

 

Choice of the weapon

There are many different types of games. Some games naturally encourage specific pattern for the development of their associated AI. Our game follows a structure such that it is easily possible to build a Game Tree for it.

A Game Tree is defined as a directed graph whose nodes are positions in a game and whose edges are moves in Wikipedia. This is pretty much the structure followed by our Game Logic API:

  • A turn represents a valid point in time in a complete game
  • A transition is a valid progression from one turn a next turn

One overall good strategy to explore a Game Tree is to use the Minimax algorithm. This algorithm was originally created to deal with two player games, so we will need to adapt it to make it work for the 3 players of Tribolo.

 

No Alpha – No Beta

The issue with the Minimax algorithm is the combinatorial explosion due to the high branching factor of the Game Tree. One good complement that allows to reduce the search tree is to use Alpha Beta pruning.

As for the Minimax, this algorithm was originally developed to deal with two player games. But unlike the Minimax, the Alpha Beta pruning is pretty hard to adapt to deal with three player games. You can check this paper for more details.

In addition to this, the rules of Tribolo are such that there is no clear alternating sequence of players. A transition might make another player unable to play, skipping his turn.

This means the next turns might not have the same player. Since the player drives whether we should minimize or maximise, this in turn means that there is no guarantied interleaving of min and max phases.

All these concerns conspire to make it really hard to come up with an implementation of Alpha-Beta pruning for Tribolo. So we will have to be do with a Minimax implementation, adapted for three players, without branch pruning.

 

High level AI

Our Minimax algorithm will not be able to go very deep in the Game Tree. But we might be able to offset this by choosing good evaluation functions. An evaluation function is defined in Wikipedia as a function used by game-playing programs to estimate the value or goodness of a position in the minimax algorithm.

In general though, there is no single evaluation function that will fit all situations best. One common approach is to select different evaluation functions depending on the stage of the game.

  • Are we in in the early game? Or in the late game?
  • Is the AI in a specific dangerous situation?
  • Is the opponent in a specific dangerous situation?
  • Is the AI loosing? Or is it winning?

Our AI will try to adopt different evaluation functions to deal with some specific situations. We will call these AI strategies. Selecting the right strategy can have a big impact on how good the AI performs.

 

AI Strategies

One obvious evaluation function for Tribolo is to use the score of the game. Namely, we could optimize for the number of cells owned by the player. This is a good strategy for the late game especially, in which we only care about winning.

Among the specific situation, there are cases in which a player is in great danger of not being able to play a single move in the following turns. We will call this situation being trapped. We might change the evaluation function to maximize the number of available transitions when such a situation seems to come at horizon. This strategy is very useful in early or middle game, but much less useful in late game.

Last but not least, we can make the AI more aggressive against the winning player. To compensate for the lack of pruning in our Minimax implementation, the AIs will both ally against the player (to ruin his score) when the player is winning. This strategy is valid in all stages of the game.

These three strategies are the ones we will implement for our Tribolo game. Our design will need to support them (and an easy way to add more in general).

 

Minimax implementation


We determined that our algorithm of choice will be a variation on Minimax. We need to adapt the simple Minimax to include the support for:

  • Three players instead of two players
  • Non standard alternation of players
  • Different evaluation functions

 

AI Strategy Protocol

Based on the fact players might alternate in different patterns, and based on the fact we want to support AI alliance against the player, the MIN and MAX parts of the Minimax algorithm will need to be parameterized.

In addition, we have to support different kind of evaluation functions. This leads us to define a protocol for an evaluation strategy that contains two methods:

  • eval-turn represents the minimax evaluation function for a turn
  • maximizing? indicates whether or not we want to min or max the scores

We call this protocol AIStrategy:

For instance, a strategy where both AI play against the human player would provide an implementation of maximizing? that would return true for turns played by the AI and false otherwise. It would naturally lead to both AI helping each other.

 

Minimax

Our next task is to implement a variant of a fixed depth Minimax implementation, parameterized by an implementation of the AIStrategy protocol we defined in the previous section.

This algorithm can be described by a nice recursion. The base case is when we reach a leaf of a search tree (the maximum depth we want to search for, or the end of the game), in which case we apply the eval-turn to give a score to the turn. Otherwise, we recur and apply a min or a max on the scores returned by each of the sub-trees.

This can be summarized with the following piece of code, which makes use of the function minimax-step to implement the recursion (which we will see after):

 
The implementation of minimax-step follows the open recursion technic. Instead of having minimax calling itself directly, we provide to minimax-step the function to call to recur. Thanks to this, we will be able to reuse this open recursive definition to implement our AI Strategies later.

Our minimax-step function will also be parameterizable with different min and max implementations. This flexibility will also help us a lot later.

 
Note 1: We do not need to worry about blowing the stack with such recursions. The search tree grows exponentially with the search depth, so we will never be able to search very deep anyway.

Note 2: If you are curious about open recursion and its different usage, you can read more about it in this previous article in which we used it to help with the implementation memoization.

 

Top level search

Our minimax function attributes a score to an turn by looking at the game tree starting from it. To find our best move, we need to call our minimax function for each reachable turn from our initial turn.

We should be able to re-use our minimax-step with one twist. As we need to keep track of the coordinate to play, we want to return tuples containing both a move and a score and maximize by score alone.

To customize the comparison, we create a variant minimax-step-by that accepts a projection for the comparison:

 
Using this minimax-step-by, we can implement our top search game-tree-search function. It takes an AIStrategy, a turn from which to start the search, and a maximum depth search. It returns what it believes to be the best move to play.

 

High level AI


Our Minimax-based AI algorithm is now ready and implemented by game-tree-search. Our next job is to implement useful AIStrategies to use in this game tree search.

 

Optimizing scores

Our first AIStrategy consists in optimizing the score of a chosen player. Because our turns contains the transitions to the next turns, we can be clever and use this data to go one level deeper in the search.

We will use our open recursive minimax-step function, but instead of calling our minimax recursively, we provide it with a function that computes the scores taking into account the transition. This computation is much cheaper than accounting for the whole transition.

We extract this logic inside the eval-next-score-of function, that takes as input the players whose scores should be summed and optimized for.

 
Based on this previous function, we can easily derive the implementations of our two AI that care about the scores:

  • The optimize-own-score-ai which optimize the score of the playing AI
  • The optmize-ai-scores-ai which optimize the scores of both AI (allied)

 

Avoiding traps

Our second family of AIStrategy is about avoiding being trapped with no move left to play. The corresponding evaluation function will maximize the number of available transition instead of the score. This is pretty simple to implement:

 

Assembling the pieces


Both our Minimax algorithm and our AI evaluation functions are ready. We only need to plug these parts together.

 
We first implement a High Level AI that will select the appropriate AI strategy to use depending on the game configuration.

Note: the details of the different predicates used in the function above are available in the following GitHub Gist.

 
Our last task is to complete our find-best-move function, which was the goal of this post. This functions is the only public function of the whole AI namespace and does:

  • Select the appropriate AI strategy using the high-level-ai function
  • Run a game tree search using game-tree-search and the selected AI

 
Note: The resulting search is performed with a depth of 2 which is quite low. In reality, because our score evaluation function also looks ahead, we get a search tree of depth 3. This strikes a balance between the need for the AI to run on the Web browser of low power devices (tablets or smart phones) and the difficulty.

We could have chosen different depth for different devices, but it would make the game harder on PC than on smartphones. Apart from being annoying to implement, it is not quite fair. To go deeper, we would instead need to add pruning to our game tree.

 

Conclusion


With this last post focusing on the implementation of the artificial intelligence, our Tribolo game is now fully complete. You can check the source code in the following GitHub repository.

I hope you did enjoy the series and the game that we built through it. I hope it did give you some ideas, made you discover at least one thing, or was as much fun as it was for me to implement it.

 

Special thanks


I would like to thank Manwe56 for our discussions on how to build an smart enough AI. His great insights helped me in designing my first ever artificial intelligence. In particular, I owe him the concept of high level AI, which helped me tremendously.

Manwe56 is regularly present in the top 1% of some of the major competitive programming contests. He is also the author of the competitive-programming GitHub repository that contains some generic implementation of the most common AI algorithms for both Java and C++.

Building a Clojurescript game: Application state

This post is the sixth in the series of posts focused on the design and implementation of a port in ClojureScript of the game named Tribolo. You can try the game at the following address.

Our first post described the game and identified the following responsibilities, each of them will be developed in a different post:

  1. Game logic: the rules governing transitions between game states
  2. Rendering: transforming the current game state into HTML and SVG
  3. Store: the in-memory data-base holding the application state
  4. Game loop: coordinates and control of the different aspects
  5. AI logic: selection of the best next move for an AI player

Among these different parts, we already implemented the game logic as well as the rendering. The goal of today is to implement the application state and the game loop.

 

Reminder on the target architecture


Our different responsibilities are interconnected through the following high level architecture (you can refer to our first post for more details):

   Store ----------------> Rendering ---------------> Reagent
     ^      (reaction)         ^      (send hiccup)
     |                         |
     |                         | (listen)
     |       (update)          |
     \-------------------- Game loop
                               |
                               |
     ,-------------------------| (use to update store)
     |                         |
     v                         v
 Game logic <-------------- AI logic

 

Store

The store represents our application state. The application state is kept inside a Reagent ratom. We can use Reagent reaction macro to create views on the main ratom.

Each of these views will be automatically refreshed when the ratom they dereference changes. Rendering functions dereferencing a ratom or a reaction are implicitly wrapped in a reaction. They will be refreshed upon changes of their sources.

 

Game loop

The game loop is the reactive component that connects the different components of the application. It listens to all the events of the application and triggers the appropriate responses. In particular, the game loop will be responsible for:

  • Listening for the user interface events
  • Triggering the artificial intelligence actions
  • Updating the application state in response to these events

We did not develop the artificial intelligence yet, so we will assume the existence of a function find-best-move. This function will take as input a game and returns the coordinate for the current player to play at. We can provide it a nice spec to specify it more precisely:

 

Application state


The application state is the place where every piece of state-full data is put in. We need to identify the data we need to store and then organize it. Fortunately, our application is pretty simple. We need to store the game and whether the suggestion feature is on. A simple map will do:

In case the data to store is more complex, there might be better solution than storing map inside the application state, for example libraries such as DataScript.

 

Reactive views

The application data is definitively not at rest. Most of our user interactions will update it. Since consumers are generally interested in parts of the application state only, they will not be interested by most updates.

To accommodate the needs of the application data consumers, and avoid spamming them to much, we define a bunch of Reagent reactions. For instance, the game reaction zooms on the :game keyword of our store and the current-turn further zooms on the current turn of the game.

We list below the most useful reactions for our user interface:

 

Reactive functions

We can also define reactions holding functions. For instance, the suggestions reaction will hold a function to provide to the rendering. This function will allow identifying whether a cell should be highlighted as an available move for the human player.

If the current player is either an AI or if the suggestion feature is disabled, the reactive function will always return false. Otherwise, the reactive function will do a search inside the set of coordinates leading to a valid transition.

 

Updates

The game loop is in charge of triggering the updates of the store in response to events. Our store will define some convenience functions to help with these updates:

  • swap-game! applies a function on the game of the store to update it
  • toggle-help! switches the suggestion feature on and off

 

Game loop


The game loop is the bridge between the events received in the user interface and the update of the application state. The game loop is also responsible for the scheduling of events such as triggering the artificial intelligence computation and play.

 

The need

Our Tic Tac Toe did not need any game loop. Updating the application state was done directly inside the callbacks listening the user interface events. What is it exactly that make us want a game loop?

Our Tribolo game has become complex enough to make the approach of directly updating the application state less practical. In particular, the following time constraints will require us to do something more involved than direct updates:

The artificial intelligence will require a bit of time to compute its next move. During that time, the human player shall not be able to play. To avoid any weird interactions, the best is to ignore all play commands from the human player when the AI plays.

The other events (such as restart or undo) triggered during the AI turns should however still be listened to. Such events will completely change the state of the game, and will thus need to discard any on-going AI computation. New AI computations might have to be triggered instead.

The timings and animations also participate in augmenting complexity. The AI might take variable time to compute its move. We would like to avoid the human player to witness instantaneous moves. This would make the ongoing game hard to understand, especially if the moves are chained faster than the animations.

To summarize, the need for the game loop comes from the increased complexity of the pattern of interactions with the user.

 

Requirements

We can translate the various aspects we listed in our previous section into requirements for our game loop:

  • Menu events should drop any ongoing AI computations
  • The human player play events should be ignored during AI computations
  • AI computations should not intervene during human player turn
  • We impose a delay of 1.5 seconds before an AI can play its move

 

Core Async to the rescue

One of the great advantages of using ClojureScript over JavaScript is that it comes with core.async. This library is really good at managing time dependent and concurrency aspects in software.

We will rely on this library for our game loop. You might have an easier time following the remaining of this post if you are familiar with the concept of Communicating Sequential Processes on which is based core.async.

As a refresher, CSP is built around the concept of lightweight processes synchronising and exchanging data through the use of channels. Channels are things you can send a message into or receive a message from.

In Clojure, a go block is such a lightweight unit of computation and chan allows to build channels. These processes run concurrently and park when they wait for inputs from a channel, or wait for an output channel to have some space to write into.

One interesting aspect of go blocks is that they return a channel. This channel will be sent the result of the last expression executed in the go block. It means we can create the equivalent of futures by just wrapping an expression inside a go macro.

 

AI computations

The following example demonstrates some of the power of core.async. Thanks to it, we are able to deal with what would be otherwise annoying callbacks interactions triggered using js/setTimeout.

The ai-computation function starts an AI computation asynchronously. The goal is to keep the UI responsive to events during the computation. We use async/timeout to create channels that will delay the result computation for at most 1.5 seconds before releasing it. This means the AI will play at most once every 1.5 second.

Note: We split the delay in two parts, 500 ms followed by 1 second. The first delay is used to explicitly give control to the browser for the rendering of the transitions. The user interface would otherwise experience some lag.

 

Handling events

The game loop will receive two different kinds of events. It might receive play events which are the events that move the game forward. It might also receive game events which are the other kind of event that affect the game: restart, undo and new game.

In in the play events we can further distinguish those coming from the user interface (the human player) and those coming from the asynchronous AI computation we described in the previous section.

We can summarize this in the following ASCII Art diagram:

                      Click on the board
                              |
                              |
                              v
   Game events           Play event <-----\
        |                     |           |
        |                     |           |
        \----------.----------/           |
                   |                      |
           (feeds) |                      |
                   |                      |
                   v                      |
               Game loop                  |
                   |                      |
        (triggers) |                      |
                   |                      |
                   v                      |
             AI Computation --------------/     

 
The clicks on the board should be discarded when the current turn is the one of an AI. We can use a transducer on a channel to act as a latch that disable or enable the channel. Depending on whether the current player is an AI or not, the game loop will also need to select the appropriate play event channel to listen to.

This process of selection and filtering is encapsulated inside the start-game-loop function which creates the loop and returns the two input channels of the game loop:

  • The play-events channel in which clicks on the board will be sent to
  • The game-events channel for the restart, new game and undo events

This function makes use of the handle-game-event! responsible for dealing with the restart, new game and undo events:

 

Game loop public API

Until now, all the functions we described are private and describe the implementation of the game loop. In particular, start-game-loop implements the creation of the loop.

Since our game only needs one game loop, we will create a game-loop definition to get the result of start-game-loop. We then provide the send-play-event! and send-game-event! functions to send the events to the corresponding channels.

The public API of our game loop is thus limited to the three following definitions:

 

Plugging it all together


The core namespace of the Tribolo is responsible for plugging everything together. In particular, it plugs:

  • The appropriate reactions to the main-frame rendering function
  • The game and play events to the channels of the game loop
  • The toggle suggestion event directly to the store toggle-help! function
  • The whole into the reagent/render function to start the rendering

 

Conclusion and what’s next


We are done with state and time management for our Tribolo game. Using Reagent and core.async, we were able to deal with this task rather easily.

The last remaining part of our game to implement is the artificial intelligence. This will be the subject of the the next post.