论文标题
在模拟中学习从学习中推动对象的自我适应复发模型
Self-Adapting Recurrent Models for Object Pushing from Learning in Simulation
论文作者
论文摘要
Planar推动仍然是一个充满挑战的研究主题,其中建立交互的动态模型是核心问题。即使是准确的分析动力学模型也在固有地不稳定,因为只能近似物理参数(例如惯性和摩擦)。数据驱动的模型通常依赖大量的培训数据,但是与真实机器人一起工作时,数据收集很耗时。 在本文中,我们收集物理模拟器中的所有培训数据,并建立一个基于LSTM的模型以适合推动动力学。应用域随机化来捕获广义对象类的推动轨迹。当在真实机器人上执行时,受过训练的递归模型会在几个步骤内适应所跟踪的对象的真实动态。我们将算法\ emph {recurrent}模型预测路径积分(RMPPI)作为原始MPPI方法的变体,采用了状态依赖性复发模型。 作为比较,我们还将深层确定性策略梯度(DDPG)网络训练为无模型基线,该基线也被用作数据收集阶段的动作生成器。在政策培训期间,事后经验重播用于提高勘探效率。在我们的UR5平台上推动实验证明了该模型的适应性和提议框架的有效性。
Planar pushing remains a challenging research topic, where building the dynamic model of the interaction is the core issue. Even an accurate analytical dynamic model is inherently unstable because physics parameters such as inertia and friction can only be approximated. Data-driven models usually rely on large amounts of training data, but data collection is time consuming when working with real robots. In this paper, we collect all training data in a physics simulator and build an LSTM-based model to fit the pushing dynamics. Domain Randomization is applied to capture the pushing trajectories of a generalized class of objects. When executed on the real robot, the trained recursive model adapts to the tracked object's real dynamics within a few steps. We propose the algorithm \emph{Recurrent} Model Predictive Path Integral (RMPPI) as a variation of the original MPPI approach, employing state-dependent recurrent models. As a comparison, we also train a Deep Deterministic Policy Gradient (DDPG) network as a model-free baseline, which is also used as the action generator in the data collection phase. During policy training, Hindsight Experience Replay is used to improve exploration efficiency. Pushing experiments on our UR5 platform demonstrate the model's adaptability and the effectiveness of the proposed framework.