AI for Tic-Tac-Toe (Part 4)

Hello, this article is certainly the last about Tic-Tac-Toe and Artificial Intelligence. The first article (link) concerns the presentation of this little project and two bots using random behaviors: "Full Random" and "Partial Random". If there are two aligned cells, 'Partial Random' automatically places the third symbol to win. The second article (link) concerns two other bots: "The Trapper" and "The Brute". "The Trapper" places symbols to let two positions to win, So wherever the opponent plays, "The Trapper" wins. "The Brute" is an invincible bot, he knows each choice and its result, so he avoids bad ways. The third article (link) concerns the first learning bot and an interface for humans. The bot, named "The Bandit" defines choices depending of the probability to win or lose.

So, this article concerns a Neural Network bot. What is a Neural Network? I suggest you to see the skull of your favorite neighbor and you should discover an object composed with strange matter. In fact, this object is a brain and a brain is a Neural Network. The aim of mad scientists is to create robots which may replace humans: IBM has already simulated the brain of a monkey (results published during the International Conference for High Performance Computing, Networking, Storage and Analysis at Salt Lake City; SC12).

Neural Network is an complex subject and I have used this article to understand and implement the bot.

Neural Network

A Neural Network is composed of lot of Neurons and links between Neurons. A Neuron receives data from  Neurons and define a state (0 or 1) which will send to another Neurons.

A Neuron connected to others Neurons.
A Neuron connected to others Neurons.

The value of the state depends on the state of input neuron and a weight associated with each link. The computation is decomposed into two steps:  a weighted sum and a thresholding.

 sum = \sum_{i}^{\text{neurons_in}} weight_i \cdot input_i

 output = \begin{cases} 1, & \text{if } sum>\theta \\ 0, & \text{if } sum \le \theta \end{cases}

It's possible to use a activation function (for example: a sigmoid function) instead of the thresholding, but I prefer the second because the program is easier to  debug.

Simple example

I consider the Tic-Tac-Toe as a complex example. The simple examples described in this part will be a boolean function: XOR. The XOR have two inputs and one output and the behavior is defined in the next table.

 \begin{array}{c|c|c} \rm In 1 & \rm In 2 & \rm out \\ \hline 0 & 0 & 0 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{array}

In this example, I use five neurons, but the more complex choice, in neural network,  is the method to consider inputs and outputs. In the case of the boolean input I prefer using -1 and 1 instead of 0 and 1. In fact, if all of the inputs are set to 0, the computed sum of each neuron will be null for the first iteration. I think the system will converge speeder with this choice. For output, if the sum is positive, so the result is 1, else 0.

Example of neural network
Example of neural network. At Left, the inputs. At right the output. At the middle, the neurons.

So if the inputs are 1 and -1 (as the figure), the neurons calculate the weighted sum and result is compared with the threshold \theta = 0 (bias).

 \begin{matrix} (1 * 0.68) + (-1 * -0.21) = 0.89 &, \text{ as } & 0.89+\theta_1>0 & \text{ then } Out_1=1 \\ (1 * 0.57) + (-1 * 0.59) = -0.02 & , \text{ as } & -0.02+\theta_2 \le 0 & \text{ then } Out_2=0 \\ (1 * 0.82) + (-1 * -0.60) = 1.42 &, \text{ as } & 1.42+\theta_3>0 & \text{ then } Out_3=1 \\ (1 * -0.32) + (-1 * 0.53) = 0.85 &, \text{ as } & 0.85+\theta_4 \le 0 &\text{ then } Out_4=0 \end{matrix}

Next, the same mechanism is applied to the global output.

 (1*-0.44) + (0*0.10) + (1*-0.045) + (0*0.25) = -0.485

As the sum is negative, the brain believes that 1 \oplus 0 = 0. In fact, the result is wrong because the brain has never learnt the rules.

When a biological brain is punished, a chemical effect changes the weights of connections. With the numerical neural network, the learning system is a backward error propagation. Arbitrarily, we consider there are an error of -1 (the value needs to be negative; it is equal to the desired value minus current value).

Next, weights of connections are modified depending on the chosen error and the output value of the neuron. Rate is a constant parameter which reduces the effect of modification (here 0.05).

 weight {+}{=} (-1) * rate * output * error

Output may have two value 0 and 1, so, values are modified only if the state of the neuron is 1.

Backward error propagation in connections
Backward error propagation in connections

The next step is similar to the normal use of Neural Network: for each neuron, an error is computed as a weighted sum.

 sum_{\text{error}} = \sum_{i}^{\text{neurons_out}} error_i * weight_i

Next, the propagation depends of the value of the neuron:

 error_{\text{neuron}} = sum_{\text{error}} * output_{\text{neuron}}

Initially, the bias \theta, is set to 0. But, we need to modify it:

 \theta_{\text{neuron}} {+}{=} (-1) * rate * error_{\text{neuron}}

Finally, the error is sent to connections to modify their weight. The next figure shows the neural network after learning. The system had made mistake only nine times to converge to this solution.

Neural Network after learning phase
Neural Network after learning phase

8th competitor: 'The Brain'

The game Tic-Tac-Toe is a more complex case than the Xor function. So, a difficult will be the choice of inputs or outputs. The inputs corresponds at each cell of the game: If the cell is empty, then value is 0, if the symbol is a cross 1 and  if the symbol is a nought -1. The output corresponds at each cell too and the selected cell is the cell with the maximal value.

"The Brain" need to learn two things:

  • It must chose an empty cell.
  • It must avoid the opponent aligns 3 symbols

Now, we limit the learning to the first point, so the bot need to select an empty cell.

The first 1000 games, the bot made 4624 selection error. Finally, the system converges to 4614. The efficiency of the bot is low. The question "Why?" is complex to answer: maybe a problem with the number of neurons or their initialization, bad choice of parameters (rate/error) or output/input mechanism, bad programmer...

I have tested many solutions. For example, I have cheated and modified input value. With other parameters (-1 if empty cell and else 1), I have hoped to solve the problem of selection. Currently, I abandon this bot, because I have no idea to solve the problem.

Conclusion

The creation of this bot is a relative failure. In the article (link), the author tests the program with Tic-Tac-Toe and seem to have interesting results. But, the learning concerns particular cases set to the neural network and it doesn't learn by trial and error.

If we chose a constant error during the execution, values diverge and tend toward infinity. If we want to limit values, we need to use this formula:

 Error = Output-Desired

But we need to define a relative error to each cell. This means the learning needs to be assisted. I would like a bot which selects a cell, next, the program said that the bot has made a mistake. With assisted learning, the program informs to the bot where the bot can play (all positions). In fact, assisted learning works and the bot selects finally empty cells without error.

Nevertheless the  experiment of Neural Network doesn't provide the expected result: the bot isn't autonomous.