基于模型的离线模仿学习与非专家数据

论文标题

基于模型的离线模仿学习与非专家数据

Model-based Offline Imitation Learning with Non-expert Data

论文作者

Park, Jeongwon, Yang, Lin

论文摘要

尽管理论上的行为克隆（BC）遭受了复杂错误，但其可扩展性和简单性仍然使其成为一种有吸引力的模仿学习算法。相比之下，对抗性训练的模仿方法通常不会共享相同的问题，但需要与环境进行互动。同时，大多数模仿学习方法仅利用最佳数据集，而获得的数据集可能比其次优的数据集更昂贵。出现的问题是，我们可以以原则上的方式使用次优数据集，否则这本来是闲置的吗？我们提出了一个基于可扩展的模型的离线模仿学习算法框架，该算法框架利用次优和最佳策略收集的数据集，并表明其最坏的情况下，相对于专家样本，其最坏情况下的次优率在时间范围内变成线性。我们从经验上验证了我们的理论结果，并表明所提出的方法\ textIt {始终}在模拟连续控制域的低数据状态下优于BC。

Although Behavioral Cloning (BC) in theory suffers compounding errors, its scalability and simplicity still makes it an attractive imitation learning algorithm. In contrast, imitation approaches with adversarial training typically does not share the same problem, but necessitates interactions with the environment. Meanwhile, most imitation learning methods only utilises optimal datasets, which could be significantly more expensive to obtain than its suboptimal counterpart. A question that arises is, can we utilise the suboptimal dataset in a principled manner, which otherwise would have been idle? We propose a scalable model-based offline imitation learning algorithmic framework that leverages datasets collected by both suboptimal and optimal policies, and show that its worst case suboptimality becomes linear in the time horizon with respect to the expert samples. We empirically validate our theoretical results and show that the proposed method \textit{always} outperforms BC in the low data regime on simulated continuous control domains

下载PDF全文

下载文献需遵守相关版权规定

论文标题