论文标题
动态经验重播
Dynamic Experience Replay
论文作者
论文摘要
我们提出了一种称为动态体验重播(DER)的新型技术,该技术允许增强学习(RL)算法不仅可以从人类示范中重播样本,而且还可以使用RL代理在训练过程中产生的成功过渡,从而提高培训效率。它可以与任意的非政策RL算法(例如DDPG或DQN及其分布式版本)结合使用。我们基于APE-X DDPG,并根据力量/扭矩和笛卡尔姿势观测来证明我们对机器人紧身的关节装配任务的方法。特别是,我们对两个不同的任务进行实验:钉上孔和圈链。在每种情况下,我们都会比较不同的重播缓冲区结构以及DER如何影响它们。我们的消融研究表明,动态体验重播是一种至关重要的成分,它在很大程度上缩短了这些具有挑战性的环境中的训练时间,或者解决了Vanilla APE-X DDPG无法解决的任务。我们还表明,我们纯粹在模拟中学习的政策可以成功部署在真正的机器人上。介绍我们的实验的视频可从https://sites.google.com/site/dynamicexperienceplay获得
We present a novel technique called Dynamic Experience Replay (DER) that allows Reinforcement Learning (RL) algorithms to use experience replay samples not only from human demonstrations but also successful transitions generated by RL agents during training and therefore improve training efficiency. It can be combined with an arbitrary off-policy RL algorithm, such as DDPG or DQN, and their distributed versions. We build upon Ape-X DDPG and demonstrate our approach on robotic tight-fitting joint assembly tasks, based on force/torque and Cartesian pose observations. In particular, we run experiments on two different tasks: peg-in-hole and lap-joint. In each case, we compare different replay buffer structures and how DER affects them. Our ablation studies show that Dynamic Experience Replay is a crucial ingredient that either largely shortens the training time in these challenging environments or solves the tasks that the vanilla Ape-X DDPG cannot solve. We also show that our policies learned purely in simulation can be deployed successfully on the real robot. The video presenting our experiments is available at https://sites.google.com/site/dynamicexperiencereplay