在对抗性模仿学习中打击虚假负面因素

论文标题

在对抗性模仿学习中打击虚假负面因素

Combating False Negatives in Adversarial Imitation Learning

论文作者

Zolna, Konrad, Saharia, Chitwan, Boussioux, Leonard, Hui, David Yu-Tung, Chevalier-Boisvert, Maxime, Bahdanau, Dzmitry, Bengio, Yoshua

论文摘要

在对抗性模仿学习中，对歧视者进行了训练，以将代理发作与代表所需行为的专家演示区分开来。但是，随着训练有素的政策学会更加成功，负面示例（代理商制作的例子）变得越来越相似。尽管该任务在某些代理的轨迹中已成功完成，但训练了歧视者为其输出低值。我们假设对歧视者的训练信号不一致会阻碍其学习，从而导致代理商的整体表现较差。我们展示了这一假设的实验证据，并且“假否定性”（即成功的代理发作）显着阻碍了对抗性模仿学习，这是本文的第一个贡献。然后，我们提出了一种减轻假否定性影响并对其对婴儿环境进行测试的方法。该方法一致地提高基准的样品效率至少提高了一个数量级。

In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's trajectories, the discriminator is trained to output low values for them. We hypothesize that this inconsistent training signal for the discriminator can impede its learning, and consequently leads to worse overall performance of the agent. We show experimental evidence for this hypothesis and that the 'False Negatives' (i.e. successful agent episodes) significantly hinder adversarial imitation learning, which is the first contribution of this paper. Then, we propose a method to alleviate the impact of false negatives and test it on the BabyAI environment. This method consistently improves sample efficiency over the baselines by at least an order of magnitude.

下载PDF全文

下载文献需遵守相关版权规定

论文标题