用于量子增强学习的杂种式批评算法在库恩梁线上

论文标题

用于量子增强学习的杂种式批评算法在库恩梁线上

Hybrid actor-critic algorithm for quantum reinforcement learning at CERN beam lines

论文作者

Schenk, Michael, Combarro, Elías F., Grossi, Michele, Kain, Verena, Li, Kevin Shing Bruce, Popa, Mircea-Marian, Vallecorsa, Sofia

论文摘要

与经典的Q学习效率相比，与经典的Q-学习效率相比，具有离散的国家行动空间环境相比，具有夹紧量子玻尔兹曼机器（QBM）的自由能增强学习（FERL）可显着提高学习效率。在本文中，FERL方法扩展到多维连续的状态行动空间环境，以打开更广泛的现实应用程序的门。首先，研究了基于自由能的Q学习，以进行离散的动作空间，但是评估了连续的状态空间以及经验重播对样本效率的影响。在第二步中，基于深层确定性政策梯度算法将经典演员网络与基于QBM的评论家相结合的深层确定性政策梯度算法开发了用于连续国家行动空间的混合行为者。讨论了用量子退火获得的结果，包括模拟和D-Wave量子退火硬件，并将性能与经典的增强学习方法进行了比较。整个环境代表欧洲核研究组织（CERN）的现有粒子加速器线。除其他外，在晚期等离子体Wakefield实验（Awake）的实际电子束系（Awake）的实际电子束系中进行了评估。

Free energy-based reinforcement learning (FERL) with clamped quantum Boltzmann machines (QBM) was shown to significantly improve the learning efficiency compared to classical Q-learning with the restriction, however, to discrete state-action space environments. In this paper, the FERL approach is extended to multi-dimensional continuous state-action space environments to open the doors for a broader range of real-world applications. First, free energy-based Q-learning is studied for discrete action spaces, but continuous state spaces and the impact of experience replay on sample efficiency is assessed. In a second step, a hybrid actor-critic scheme for continuous state-action spaces is developed based on the Deep Deterministic Policy Gradient algorithm combining a classical actor network with a QBM-based critic. The results obtained with quantum annealing, both simulated and with D-Wave quantum annealing hardware, are discussed, and the performance is compared to classical reinforcement learning methods. The environments used throughout represent existing particle accelerator beam lines at the European Organisation for Nuclear Research (CERN). Among others, the hybrid actor-critic agent is evaluated on the actual electron beam line of the Advanced Plasma Wakefield Experiment (AWAKE).

下载PDF全文

下载文献需遵守相关版权规定

论文标题