Investigating Kasparov Chess Moves with scikit-Learn

I am practicing my machine learning (ML) skills and became curious if chess moves could be predicted, somehow.  Crazy right? ;-) tldr:

For a previous project, I collected 547 Gary Kasparov games, in PGN format (ZIP).  For this project there are 21,088 Kasparov-only moves, and I want to look at the prediction accuracy for increasing numbers of moves.

My hypothesis is that this will drop sharply as the number of moves increases, since the complexity or number of unique moves increases over time.  So, I would be happily surprised, if not amazed, if a prediction accuracy of 20% were even possible.

So how do you represent a populated chessboard full game in a format that is digestible by a machine learning algorithm?  How do you represent the target of the move to predict?  Well what I chose to do is to flatten the Forsyth-Edwards Notation (or FEN) into a comma separated value (or CSV) row.  Each piece is encoded as an arbitrary number and unoccupied squares are zero.  I chose p=10, r=11, n=12, b=13, q=14, k=15 and P=20, R=21, N=22, B=23, Q=24, K=25.

For the target, I chose to use the standard algebraic notation (or SAN) of the moves.   (But as of this writing, I wonder if using alternate chess move notations would change prediction accuracy. Hmmm… Update: Ok. Changing the target notation to “UCI” (e.g. “a7a8” form) does not change the prediction accuracy.)

Also, I add the number of the move to the beginning to the list.  So for the initial game setup and an initial move we have:

For a full game, we have rows 1, 3, 5, etc. when Kasparov is the white player and 2, 4, 6, etc. when he plays black.  For every game in the collection, where every move is considered, there are 21,088 rows (as mentioned above).

With this CSV data in hand we can analyze it with various ML algorithms.

First thing is to break the data into training and testing sets.  I do this with the scikit-learn train_test_split() function.  Next is to train different models and check out their accuracy!  (Please see the code for details.)

The first model I tried is k-nearest neighbors (or KNN).  Here is a snip of the code for it:

This yields 0.12101669195751139 for all moves in game.  A 12% prediction accuracy.  I expected this!  And when looking at the accuracy graph for k=1 to 25, the line drops directly as k increases.  Again, I expected this…







The next model tried is Multinomial Naive Bayes.  This results in a 7.2% prediction accuracy.  Oof.

How about a Support vector machine?  That’s a bit better with an 15.4% prediction accuracy.

Ok lastly let’s look at a Decision tree: For all moves, that weighs in at a whopping 17.1%! Woo.

What does the accuracy graph look like given the number of moves?  For example, what is the prediction accuracy when considering only the first moves?  What about as the number increases?

Here is a snip that performs that (given the function definitions defined in the code):

And here is the graph:








In conclusion, this shows that these vanilla (un-tweaked) models, based on re-coding the board FEN, do not predict Kasparov’s moves with any acceptable degree of accuracy.


Surprisingly, the addition of two extra columns – the move distances from the king and the opponent king – increases the decision tree prediction accuracy of all moves to 28.22% – Woo!


When I compute this move prediction with the multi-layer perceptron, MLPClassifier (a type of neural network), an accuracy score of 22.59% is achieved.  Somehow, I thought this would be higher.  Hmmm.  :\