论文标题
学会从人类示范中跨越长马的任务
Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations
论文作者
论文摘要
模仿学习是一种在现实世界中培训机器人政策的有效技术,因为它不取决于昂贵的随机探索过程。但是,由于缺乏勘探,学习政策超出了所展示的行为仍然是一个开放的挑战。我们提出了一个新颖的模仿学习框架,以使机器人能够从少数人的示范中有效地学习复杂的现实世界操纵任务,以及2)合成收集的示范中未包含的新行为。我们的主要见解是,多任务域通常会呈现潜在结构,在该结构中,在状态空间的共同区域展示了不同任务的轨迹。我们通过模仿(GTI)进行概括,这是一种两阶段的离线模仿学习算法,该算法利用了这种相交结构来训练以目标为导向的政策,该政策概括为看不见的开始和目标状态组合。在GTI的第一阶段,我们训练一种随机政策,该政策利用轨迹交集具有从不同的示范轨迹中构成行为的能力。在GTI的第二阶段,我们从第一阶段的无条件随机策略中收集了一小部分推出,并训练一个目标定向的代理以推广到新颖的开始和目标配置。我们在模拟域和充满挑战的长途机器人操纵领域中验证了GTI。可在https://sites.google.com/view/gti2020/上获得其他结果和视频。
Imitation learning is an effective and safe technique to train robot policies in the real world because it does not depend on an expensive random exploration process. However, due to the lack of exploration, learning policies that generalize beyond the demonstrated behaviors is still an open challenge. We present a novel imitation learning framework to enable robots to 1) learn complex real world manipulation tasks efficiently from a small number of human demonstrations, and 2) synthesize new behaviors not contained in the collected demonstrations. Our key insight is that multi-task domains often present a latent structure, where demonstrated trajectories for different tasks intersect at common regions of the state space. We present Generalization Through Imitation (GTI), a two-stage offline imitation learning algorithm that exploits this intersecting structure to train goal-directed policies that generalize to unseen start and goal state combinations. In the first stage of GTI, we train a stochastic policy that leverages trajectory intersections to have the capacity to compose behaviors from different demonstration trajectories together. In the second stage of GTI, we collect a small set of rollouts from the unconditioned stochastic policy of the first stage, and train a goal-directed agent to generalize to novel start and goal configurations. We validate GTI in both simulated domains and a challenging long-horizon robotic manipulation domain in the real world. Additional results and videos are available at https://sites.google.com/view/gti2020/ .