论文标题
在多目标环境中务实地从教学演示中学习
Pragmatically Learning from Pedagogical Demonstrations in Multi-Goal Environments
论文作者
论文摘要
从演示方法中学习通常利用接近最佳示范的方法来加速培训。相比之下,在展示任务时,人类教师会偏离最佳示威活动,并通过提供最佳歧视他们想要展示的目标的示威来改变其行为。类似地,人类的学习者务实地务实地推断了老师的意图,从而促进了两个代理商之间的沟通。在少数示威制度中,这些机制至关重要,在少数示威制度中,推断目标更加困难。在本文中,我们通过利用示威游行(BGI)的贝叶斯定位模型来实现教学法和实用主义机制。我们在多目标教师学习者的设置中强调了该模型的好处,并通过两个人工代理人通过目标条件的强化学习来学习。我们表明,结合BGI-Agent(教学教师和务实的学习者)会导致学习速度更快,并减少了从演示中进行标准学习的目标歧义,尤其是在少数示威制度中。我们为实验提供代码(https://github.com/caselles/neurips22-dymstrations-pedagogy-pragmatism),以及一个说明性视频,解释了我们的方法(https://youtu.be/youtu.be/v4nn16ijknyw)。
Learning from demonstration methods usually leverage close to optimal demonstrations to accelerate training. By contrast, when demonstrating a task, human teachers deviate from optimal demonstrations and pedagogically modify their behavior by giving demonstrations that best disambiguate the goal they want to demonstrate. Analogously, human learners excel at pragmatically inferring the intent of the teacher, facilitating communication between the two agents. These mechanisms are critical in the few demonstrations regime, where inferring the goal is more difficult. In this paper, we implement pedagogy and pragmatism mechanisms by leveraging a Bayesian model of Goal Inference from demonstrations (BGI). We highlight the benefits of this model in multi-goal teacher-learner setups with two artificial agents that learn with goal-conditioned Reinforcement Learning. We show that combining BGI-agents (a pedagogical teacher and a pragmatic learner) results in faster learning and reduced goal ambiguity over standard learning from demonstrations, especially in the few demonstrations regime. We provide the code for our experiments (https://github.com/Caselles/NeurIPS22-demonstrations-pedagogy-pragmatism), as well as an illustrative video explaining our approach (https://youtu.be/V4n16IjkNyw).