论文标题
用于机器人控制
Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control
论文作者
论文摘要
已经提出了层次模仿学习(HIL),以通过使用选项框架对任务层次结构进行建模,以从专家演示中恢复长期任务中的高度复杂行为。现有方法要么忽略子任务及其相应策略之间的因果关系,要么无法以端到端的方式学习策略,从而导致次优。在这项工作中,我们基于对抗性逆增强学习开发了一种新颖的HIL算法,并根据期望最大化算法适应它,以直接从未经宣传的演示中直接恢复层次结构策略。此外,我们向目标函数介绍了一个定向的信息术语,以增强因果关系,并提出一个以端到端方式通过我们的目标学习的变异自动编码器框架。提供了有关挑战机器人控制任务的理论理由和评估,以显示我们算法的优越性。这些代码可在https://github.com/lucascjysdl/hierairl上找到。
Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations by modeling the task hierarchy with the option framework. Existing methods either overlook the causal relationship between the subtask and its corresponding policy or cannot learn the policy in an end-to-end fashion, which leads to suboptimality. In this work, we develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning and adapt it with the Expectation-Maximization algorithm in order to directly recover a hierarchical policy from the unannotated demonstrations. Further, we introduce a directed information term to the objective function to enhance the causality and propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion. Theoretical justifications and evaluations on challenging robotic control tasks are provided to show the superiority of our algorithm. The codes are available at https://github.com/LucasCJYSDL/HierAIRL.