论文标题
使用增强模拟解决复杂和实时物理问题解决的数据有效学习
Data-Efficient Learning for Complex and Real-Time Physical Problem Solving using Augmented Simulation
论文作者
论文摘要
人类迅速解决具有复杂动力学的新型系统中的任务,而无需大量互动。尽管深度强化学习算法在许多复杂的任务中取得了巨大的成功,但这些算法需要大量样本来学习有意义的政策。在本文中,我们提出了将大理石导航到圆形迷宫中心的任务。尽管该系统非常直观且容易解决,但对于标准的增强学习算法而言,学习有意义的政策可能非常困难和效率低下。我们提出了一个模型,该模型学会在与真实系统互动的几分钟内在复杂环境中移动大理石。学习包括使用来自实际系统数据估算的参数初始化物理引擎。然后,使用高斯过程回归纠正物理引擎中的误差,该回归用于对实际观测和物理发动机模拟之间的残差进行建模。然后,使用残留模型的物理发动机增强,然后使用模型预测的反馈在后退的视野中使用模型预测的反馈来控制大理石。据我们所知,这是使用非线性模型预测性控制(NMPC),首次使用由完整物理引擎和统计功能近似器组成的混合模型和统计功能近似器实时控制复杂的物理系统。
Humans quickly solve tasks in novel systems with complex dynamics, without requiring much interaction. While deep reinforcement learning algorithms have achieved tremendous success in many complex tasks, these algorithms need a large number of samples to learn meaningful policies. In this paper, we present a task for navigating a marble to the center of a circular maze. While this system is very intuitive and easy for humans to solve, it can be very difficult and inefficient for standard reinforcement learning algorithms to learn meaningful policies. We present a model that learns to move a marble in the complex environment within minutes of interacting with the real system. Learning consists of initializing a physics engine with parameters estimated using data from the real system. The error in the physics engine is then corrected using Gaussian process regression, which is used to model the residual between real observations and physics engine simulations. The physics engine augmented with the residual model is then used to control the marble in the maze environment using a model-predictive feedback over a receding horizon. To the best of our knowledge, this is the first time that a hybrid model consisting of a full physics engine along with a statistical function approximator has been used to control a complex physical system in real-time using nonlinear model-predictive control (NMPC).