使用在线顺序学习的基于FPGA的设备增强学习方法

论文标题

使用在线顺序学习的基于FPGA的设备增强学习方法

An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning

论文作者

Watanabe, Hirohisa, Tsukada, Mineto, Matsutani, Hiroki

论文摘要

DQN（深Q-NETWORK）是一种使用深神经网络进行增强学习的Q学习的方法。 DQN需要大型缓冲区和批处理处理才能进行体验重播，并依靠基于反向传播的迭代优化，这使得它们难以在资源有限的边缘设备上实现。在本文中，我们为低成本FPGA设备提出了一种轻巧的在设备加固学习方法。它利用了最近提出的基于设备学习方法的神经网络，该方法不依赖于反向传播方法，而是使用基于OS-ELM（在线顺序极端学习机）的培训算法。此外，我们提出了在设备增强学习中的L2正则化和光谱归一化的组合，以便将神经网络的输出值拟合到一定范围内，并且强化学习变得稳定。拟议的增强学习方法是为Pynq-Z1板设计的，作为低成本FPGA平台。使用OpenAI体育馆的评估结果表明，与常规DQN节点的数量为64时，所提出的算法及其FPGA实施完成了Cartpole-V0任务29.77X和89.40倍的速度。

DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement learning using deep neural networks. DQNs require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices. In this paper, we propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method but uses OS-ELM (Online Sequential Extreme Learning Machine) based training algorithm. In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable. The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate that the proposed algorithm and its FPGA implementation complete a CartPole-v0 task 29.77x and 89.40x faster than a conventional DQN-based approach when the number of hidden-layer nodes is 64.

下载PDF全文

下载文献需遵守相关版权规定

论文标题