国际象棋神经网络

论文标题

国际象棋神经网络

Neural Networks for Chess

论文作者

Klein, Dominik

论文摘要

Alphazero，Leela Chess Zero和Stockfish Nnue革新了计算机国际象棋。这本书对此类引擎的技术内部工作进行了完整的介绍。该书分为四个主要章节 - 不包括第1章（简介）和第6章（结论）：第2章引入神经网络，涵盖了所有用于构建深层网络的基本构建块，例如Alphazero使用的网络。内容包括感知器，后传播和梯度下降，分类，回归，多层感知器，矢量化技术，卷积网络，挤压网络，挤压和激发网络，完全连接的网络，批处理归一化和横向归一化和跨性线性单元，残留层，剩余层，过度效果和卧式。第3章介绍了用于国际象棋发动机以及Alphazero使用的经典搜索技术。内容包括Minimax，Alpha-Beta搜索和蒙特卡洛树搜索。第4章展示了现代国际象棋发动机的设计。除了开创性的Alphago，Alphago Zero和Alphazero外，我们还涵盖Leela Chess Zero，Fat Fritz，Fat Fritz 2以及有效更新的神经网络（NNUE）以及MAIA。第5章是关于实施微型α的。 Shexapawn是国际象棋的简约版本，以此为例。 Minimax搜索和监督学习的培训位置可以解决Hexapawn。然后，作为比较，实施了类似α的训练环，在通过自我游戏结合加强学习的情况下进行训练。最后，比较了类似α的培训和监督培训。

AlphaZero, Leela Chess Zero and Stockfish NNUE revolutionized Computer Chess. This book gives a complete introduction into the technical inner workings of such engines. The book is split into four main chapters -- excluding chapter 1 (introduction) and chapter 6 (conclusion): Chapter 2 introduces neural networks and covers all the basic building blocks that are used to build deep networks such as those used by AlphaZero. Contents include the perceptron, back-propagation and gradient descent, classification, regression, multilayer perceptron, vectorization techniques, convolutional networks, squeeze and excitation networks, fully connected networks, batch normalization and rectified linear units, residual layers, overfitting and underfitting. Chapter 3 introduces classical search techniques used for chess engines as well as those used by AlphaZero. Contents include minimax, alpha-beta search, and Monte Carlo tree search. Chapter 4 shows how modern chess engines are designed. Aside from the ground-breaking AlphaGo, AlphaGo Zero and AlphaZero we cover Leela Chess Zero, Fat Fritz, Fat Fritz 2 and Efficiently Updatable Neural Networks (NNUE) as well as Maia. Chapter 5 is about implementing a miniaturized AlphaZero. Hexapawn, a minimalistic version of chess, is used as an example for that. Hexapawn is solved by minimax search and training positions for supervised learning are generated. Then as a comparison, an AlphaZero-like training loop is implemented where training is done via self-play combined with reinforcement learning. Finally, AlphaZero-like training and supervised training are compared.

下载PDF全文

下载文献需遵守相关版权规定

论文标题