Monday, March 14, 2022

Making Santa-Tac-Toe part 3 of 4

 Part 3: Random Moves Meet Monty Carlo

Thanks in small part to AlphaGo, there is a huge interest in Monty Carlo Tree Search (MCTS). As the name implies, this is built off the idea of a Monty Carlo Simulation, where you run a large number of random simulations to help determine the best configuration of what is being simulated. Tic-Tac-Toe is too simple for MCTS, but the simulation aspect would be simple to implement and could be interesting so lets do that. We will then look at what would need to be done to turn the simulation into a proper MCTS AI. 

The heart of a simulation is the ability to play a random game. This is simply a matter of making random moves until the game is in a win or tie state, at which point we return the state of the game so we can keep track of the wins and losses.

playMockGame(board, curPlayer) {
   
let b = new TTTBoard(board.getTemplate());
    let
state = b.getGameState();
    let
p = curPlayer;
    while
(state.state === TTTConsts.PLAYING) {
        b.
setTileByIndex(super.getNextMove(b, p), p);
       
p = (p === TTTConsts.X) ? TTTConsts.O : TTTConsts.X;
       
state = b.getGameState();
   
}
   
return state;
}


We want to score each potential move that a player can make, so simply loop over all the moves adding them to our scores array which we will be using for the next move method that our API requires we implement.

buildScoreBoard(board, curPlayer) {
   
let scores = [];
    for
(let i = 0; i < 9; ++i)
        scores.
push(this.scoreTile(board, i, curPlayer));
    return
scores;
}

Scoring a tile is simply a matter of making sure the move is valid, followed by playing several mock games and counting how many of the games the AI wins versus the losses.

scoreTile(board, index, curPlayer) {
   
if (board.getTileByIndex(index) !== TTTConsts.EMPTY)
       
return -2 * this.trials;
    let
winState = (curPlayer === TTTConsts.X) ? TTTConsts.X_WON : TTTConsts.O_WON;
    let
loseState = (curPlayer === TTTConsts.X) ? TTTConsts.O_WON : TTTConsts.X_WON;
    let
score = 0;
    this
.testBoard.clone(board);
    this
.testBoard.setTileByIndex(index, curPlayer);
    let
opponent = (curPlayer === TTTConsts.X) ? TTTConsts.O : TTTConsts.X;
    for
(let i = 0; i < this.trials; ++i) {
       
let state = this.playMockGame(this.testBoard, opponent);
        if
(state.state === winState) score += 1;
        if
(state.state === loseState) score -= 1;
   
}
   
return score;
}

Finally, implementing getNextMove is simply a matter of calling buildScoreBoard and then picking the highest score. We add a random value to each score as a tiebreaker.

getNextMove(board, playerToMove= TTTConsts.EMPTY) {
   
let scores = this.buildScoreBoard(board,playerToMove);
    let
index = 0;
    let
high = scores[0] + Math.random();
    for
(let i = 1; i < 9; ++i) {
       
let temp = scores[i] + Math.random();
        if
(temp > high) {
            index = i
;
           
high = temp;
       
}
    }

   
return index;
}

The MCTS algorithm is a bit more involved than what we are doing, but with such a small tree, implementing a full MCTS is purely academic. We would need to track a bit more information for each node in the game to properly do MCTS. Lets quickly take a look at the algorithm. It is broken into 4 basic steps.

1 Selection. The idea here is that we pick a node that we want to explore further. This is usually done by looking at the win/loss ratio in combination with the times the node has been looked at for the nodes that are not fully expanded. We want to pick the most promising node but at the same time want to make sure that each node is explored enough to properly estimate it’s worthiness. There are many methods to do this though the simplest is to use threshold * (wins+1)/(plays+1) but there are better methods available.

2 Expansion is simply to add one of the unexplored nodes of the target onto our tree. Instead of just having a scoreboard for the current round, we would need to track all the nodes we have expanded in some type of tree structure and keep track of the win/loss ratio for all of them.

3 Simulation (playing random game), what we are doing already.

4 backpropagation. Taking the score the expanded node received and passing it through all the parent nodes adjusting their probability. Remember to consider who has won so that you invert the win/loss ratio from opponent nodes.

The big issue here is that it is easy to run out of memory as you expand. One solution to this is to cap the number of nodes allocated and when you run out of nodes go into a pruning stage where you remove nodes of poorly performing choices. A node pool is a good way of doing this as when the pool is empty, you do a pruning pass adding pruned nodes (including children) back into the pool.

Now that we have several AI agents, we can put together our Christmas game. This will be covered in the next part.