回到歧管：从分布状态恢复

论文标题

回到歧管：从分布状态恢复

Back to the Manifold: Recovering from Out-of-Distribution States

论文作者

Reichlin, Alfredo, Marchetti, Giovanni Luca, Yin, Hang, Ghadirzadeh, Ali, Kragic, Danica

论文摘要

从先前收集的专家数据数据集中学习，有望在没有不安全和昂贵的在线探索的情况下获取机器人政策。但是，一个主要的挑战是培训数据集中的各州与在测试时学到的政策所访问的国家之间的分配变化。虽然先前的工作主要研究了在离线培训期间政策引起的分配变化，但研究在部署时间恢复到分数状态的问题还不是很好。我们通过引入恢复政策来减轻部署时间的分配转变，该恢复政策将代理人带回培训歧管，每当由于外部扰动而逐渐退出分布状态，例如，培训歧管。恢复策略依赖于训练数据密度的近似值和学习的模棱两可的映射，该映射将视觉观测映射到一个与机器人动作相对应的潜在空间中。我们通过在真正的机器人平台上进行了几个操纵实验来证明该方法的有效性。我们的结果表明，恢复策略使代理可以完成任务，而行为克隆仅由于分配转移问题而失败。

Learning from previously collected datasets of expert data offers the promise of acquiring robotic policies without unsafe and costly online explorations. However, a major challenge is a distributional shift between the states in the training dataset and the ones visited by the learned policy at the test time. While prior works mainly studied the distribution shift caused by the policy during the offline training, the problem of recovering from out-of-distribution states at the deployment time is not very well studied yet. We alleviate the distributional shift at the deployment time by introducing a recovery policy that brings the agent back to the training manifold whenever it steps out of the in-distribution states, e.g., due to an external perturbation. The recovery policy relies on an approximation of the training data density and a learned equivariant mapping that maps visual observations into a latent space in which translations correspond to the robot actions. We demonstrate the effectiveness of the proposed method through several manipulation experiments on a real robotic platform. Our results show that the recovery policy enables the agent to complete tasks while the behavioral cloning alone fails because of the distributional shift problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题