论文标题
通过各种逆增强学习学习多任务转让奖励
Learning Multi-Task Transferable Rewards via Variational Inverse Reinforcement Learning
论文作者
论文摘要
许多机器人任务由在高度复杂的环境中由许多时间相关的子任务组成。重要的是要通过审议时间抽象来有效解决问题来发现情境意图和适当的行动。为了了解与不断变化的任务动态分离的意图,我们将基于授权的正则化技术扩展到基于生成对抗网络框架的多个任务的情况。在具有未知动态的多任务环境下,我们专注于从未标记的专家示例中学习奖励和政策。在这项研究中,我们将情境增强权定义为相互信息的最大信息,代表了在某个状态和子任务中如何影响未来的动作。我们提出的方法得出了情境相互信息的变异下限,以优化它。我们同时通过在目标函数中添加引起的术语来同时学习可转让的多任务奖励功能和策略。通过这样做,多任务奖励功能有助于学习对环境变化的强大政策。我们验证了我们在多任务学习和多任务转移学习方面的优势。我们证明我们提出的方法具有随机性和变化的任务动态的鲁棒性。最后,我们证明我们的方法的性能和数据效率明显优于各种基准上的现有模仿学习方法。
Many robotic tasks are composed of a lot of temporally correlated sub-tasks in a highly complex environment. It is important to discover situational intentions and proper actions by deliberating on temporal abstractions to solve problems effectively. To understand the intention separated from changing task dynamics, we extend an empowerment-based regularization technique to situations with multiple tasks based on the framework of a generative adversarial network. Under the multitask environments with unknown dynamics, we focus on learning a reward and policy from the unlabeled expert examples. In this study, we define situational empowerment as the maximum of mutual information representing how an action conditioned on both a certain state and sub-task affects the future. Our proposed method derives the variational lower bound of the situational mutual information to optimize it. We simultaneously learn the transferable multi-task reward function and policy by adding an induced term to the objective function. By doing so, the multi-task reward function helps to learn a robust policy for environmental change. We validate the advantages of our approach on multi-task learning and multi-task transfer learning. We demonstrate our proposed method has the robustness of both randomness and changing task dynamics. Finally, we prove that our method has significantly better performance and data efficiency than existing imitation learning methods on various benchmarks.