通过线性二次调节器区域进行深度加固学习

论文标题

通过线性二次调节器区域进行深度加固学习

Deep Reinforcement Learning with Linear Quadratic Regulator Regions

论文作者

Fernandez, Gabriel I., Togashi, Colin, Hong, Dennis W., Yang, Lin F.

论文摘要

从业者通常依靠计算密集型领域随机化来确保在模拟中训练的训练的加强学习政策可以强有力地转移到现实世界中。但是，由于实际系统中未建模的非线性，即使是这种模拟策略仍然无法稳定地表现足以获得实际环境中的经验。在本文中，我们提出了一种新颖的方法，可以保证在模拟中训练的策略的输出，即使是高度非线性系统的稳定吸引区域。我们的核心技术是使用“偏移”神经网络来构建控制器并在模拟器中训练网络。修改后的神经网络不仅捕获了系统的非线性，而且可以证明在状态空间的某个区域中保持线性性，因此可以调节以类似于线性二次调节器，该线性二次调节器已知在真实系统中是稳定的。我们已经通过将模拟策略转移到摇摆的倒置摆上来测试了我们的新方法，并证明了其功效。

Practitioners often rely on compute-intensive domain randomization to ensure reinforcement learning policies trained in simulation can robustly transfer to the real world. Due to unmodeled nonlinearities in the real system, however, even such simulated policies can still fail to perform stably enough to acquire experience in real environments. In this paper we propose a novel method that guarantees a stable region of attraction for the output of a policy trained in simulation, even for highly nonlinear systems. Our core technique is to use "bias-shifted" neural networks for constructing the controller and training the network in the simulator. The modified neural networks not only capture the nonlinearities of the system but also provably preserve linearity in a certain region of the state space and thus can be tuned to resemble a linear quadratic regulator that is known to be stable for the real system. We have tested our new method by transferring simulated policies for a swing-up inverted pendulum to real systems and demonstrated its efficacy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题