学习基于模型的强化学习准确的长期动态学习

论文标题

学习基于模型的强化学习准确的长期动态学习

Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

论文作者

Lambert, Nathan O., Wilcox, Albert, Zhang, Howard, Pister, Kristofer S. J., Calandra, Roberto

论文摘要

准确地预测机器人系统的动力学对于基于模型的控制和增强学习至关重要。估计动力学的最常见方法是拟合一个步骤前的预测模型，并使用它递归在远距离上传播预测的状态分布。不幸的是，已知这种方法甚至将小预测错误复合，从而使长期预测不准确。在本文中，我们提出了一个新的参数化，以对国家行动数据进行监督学习，以稳定地预测地平线 - 我们称之为基于轨迹的模型。此基于轨迹的模型将初始状态，将来的时间索引和控制参数作为输入，并直接预测未来时间索引的状态。模拟和现实世界机器人任务中的实验结果表明，基于轨迹的模型产生了更准确的长期预测，提高样本效率以及预测任务奖励的能力。凭借这些改进的预测属性，我们结束了使用基于轨迹模型进行控制的方法的演示。

Accurately predicting the dynamics of robotic systems is crucial for model-based control and reinforcement learning. The most common way to estimate dynamics is by fitting a one-step ahead prediction model and using it to recursively propagate the predicted state distribution over long horizons. Unfortunately, this approach is known to compound even small prediction errors, making long-term predictions inaccurate. In this paper, we propose a new parametrization to supervised learning on state-action data to stably predict at longer horizons -- that we call a trajectory-based model. This trajectory-based model takes an initial state, a future time index, and control parameters as inputs, and directly predicts the state at the future time index. Experimental results in simulated and real-world robotic tasks show that trajectory-based models yield significantly more accurate long term predictions, improved sample efficiency, and the ability to predict task reward. With these improved prediction properties, we conclude with a demonstration of methods for using the trajectory-based model for control.

下载PDF全文

下载文献需遵守相关版权规定

论文标题