论文标题
使用在线顺序学习的基于FPGA的设备增强学习方法
An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning
论文作者
论文摘要
DQN(深Q-NETWORK)是一种使用深神经网络进行增强学习的Q学习的方法。 DQN需要大型缓冲区和批处理处理才能进行体验重播,并依靠基于反向传播的迭代优化,这使得它们难以在资源有限的边缘设备上实现。在本文中,我们为低成本FPGA设备提出了一种轻巧的在设备加固学习方法。它利用了最近提出的基于设备学习方法的神经网络,该方法不依赖于反向传播方法,而是使用基于OS-ELM(在线顺序极端学习机)的培训算法。此外,我们提出了在设备增强学习中的L2正则化和光谱归一化的组合,以便将神经网络的输出值拟合到一定范围内,并且强化学习变得稳定。拟议的增强学习方法是为Pynq-Z1板设计的,作为低成本FPGA平台。使用OpenAI体育馆的评估结果表明,与常规DQN节点的数量为64时,所提出的算法及其FPGA实施完成了Cartpole-V0任务29.77X和89.40倍的速度。
DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement learning using deep neural networks. DQNs require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices. In this paper, we propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method but uses OS-ELM (Online Sequential Extreme Learning Machine) based training algorithm. In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable. The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate that the proposed algorithm and its FPGA implementation complete a CartPole-v0 task 29.77x and 89.40x faster than a conventional DQN-based approach when the number of hidden-layer nodes is 64.