First, we generate every tic tac toe game state where the next move has more than one valid option, a mere 4,298 states. We then use an AI that plays a perfect game to determine the "right" move or set of equivalent moves to make in each case. Perfect means the AI 1) never loses, 2) tries to win as quickly as possible, and 3) moves to take advantage of opponent fumbles.
We start with a randomly generated population of neural nets of fixed geometry. Each individual net is scored on how many times it gives a "right" answer in response to being asked what move to make for each game state. The top scoring nets are favored to populate the next generation through reproduction with a chance of random mutation. The hope is that after letting it run for a while, you'll end up with a net that can play the game relatively well.
It's easy for the score to get stuck for long periods of time, maybe at a local maximum. Since it's random, you never know when the right mutation will happen. Give it a few tens of thousands of generations or so.
The perfect AI is currently implemented using a modified negamax search algorithm. This is probably the wrong choice. Any tree-based search that assumes each player is playing the best move every turn will run into the same problem: all games end in a draw very quickly, making most moves equivalent, as highlighted in WarGames. My modifications make the search favor winning quickly and try to capitalize on an opponent who might not always make the best move. A better solution would probably be to simply use the algorithm described in Wikipedia's tic tac toe page.
The nets are simple feed-forward artificial neural networks. They use a binary activation function with a variable bias (inverted in the code as a "threshold") and weight for each connection.
Each net has 18 input neurons, two for each game square, a few internal layers of different sizes, and one output which represents the desirability of the input game state. A square's first neuron receives input if that square is occupied by the player for whom we're finding moves, and the second neuron receives input if it's occupied by the opponent. To find a move to make, each valid move is applied and the resulting game state is evaluated. The move with the highest scoring state is chosen.
Each generation after the first is populated using genetic material from the previous generation. A few of the top performing individuals are simply cloned into the next generation. The rest of the generation is created from randomly combining the genomes of two parents chosen at random, biased toward the top scoring individuals. Each child is mutated slightly to stir things up. The same individual can be chosen as both parents of a child, in which case the slight mutation is the only change to its genome. The genome consists of all the biases and weights of an individual's neural net.