论文标题

模仿学习的域 - 反向和有条件的状态空间模型

Domain-Adversarial and Conditional State Space Model for Imitation Learning

论文作者

Okumura, Ryo, Okada, Masashi, Taniguchi, Tadahiro

论文摘要

已经研究了部分可观察到的马尔可夫决策过程中的状态表示学习(SRL),以了解对机器人控制任务有用的数据的抽象功能。对于SRL而言,获取域 - 不可思议的状态对于实现有效的模仿学习至关重要。没有这些状态,模仿学习就会受到域依赖性信息的阻碍。但是,当专家和代理商显示较大的域移动时,现有方法无法删除各州的这种扰动。为了克服这个问题,我们提出了一个域 - 逆境和条件状态空间模型(DAC-SSM),该模型使控制系统能够获得域,无知和任务和动态感知状态。 DAC-SSM共同优化了状态推理,观察重建,正向动态和奖励模型。为了从状态中删除域依赖性信息,以对抗性方式对模型进行了域歧视训练,并且重建在域标签上进行了调节。我们通过模仿学习对模型预测控制性能进行了实验评估,以连续控制模拟器中稀疏奖励任务,并将其与现有SRL方法的性能进行了比较。来自DAC-SSM的代理商的性能与专家相当,是基准的两倍以上。我们得出结论,域 - 不可吻合的状态对于具有较大域移动的模仿学习至关重要,可以使用DAC-SSM获得。

State representation learning (SRL) in partially observable Markov decision processes has been studied to learn abstract features of data useful for robot control tasks. For SRL, acquiring domain-agnostic states is essential for achieving efficient imitation learning. Without these states, imitation learning is hampered by domain-dependent information useless for control. However, existing methods fail to remove such disturbances from the states when the data from experts and agents show large domain shifts. To overcome this issue, we propose a domain-adversarial and conditional state space model (DAC-SSM) that enables control systems to obtain domain-agnostic and task- and dynamics-aware states. DAC-SSM jointly optimizes the state inference, observation reconstruction, forward dynamics, and reward models. To remove domain-dependent information from the states, the model is trained with domain discriminators in an adversarial manner, and the reconstruction is conditioned on domain labels. We experimentally evaluated the model predictive control performance via imitation learning for continuous control of sparse reward tasks in simulators and compared it with the performance of the existing SRL method. The agents from DAC-SSM achieved performance comparable to experts and more than twice the baselines. We conclude domain-agnostic states are essential for imitation learning that has large domain shifts and can be obtained using DAC-SSM.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源