论文标题
正规化的逆增强学习
Regularized Inverse Reinforcement Learning
论文作者
论文摘要
逆强化学习(IRL)旨在通过获取解释专家决定的奖励功能来促进学习者模仿专家行为的能力。正规化IRL将强烈的凸起正则化适用于学习者的政策,以避免专家的行为通过任意恒定的持续奖励(也称为退化解决方案)合理化。我们提出了可拖动的解决方案和实用方法来获取它们,以实现正规化IRL。当前的方法仅限于最大渗透IRL框架,将其限制在香农 - 凝集正则化器中,并提出了在实践中棘手的解决方案。我们为我们提出的IRL方法对离散和连续控制的适用性提供了理论支持,从经验上验证了我们在各种任务上的绩效。
Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior by acquiring reward functions that explain the expert's decisions. Regularized IRL applies strongly convex regularizers to the learner's policy in order to avoid the expert's behavior being rationalized by arbitrary constant rewards, also known as degenerate solutions. We propose tractable solutions, and practical methods to obtain them, for regularized IRL. Current methods are restricted to the maximum-entropy IRL framework, limiting them to Shannon-entropy regularizers, as well as proposing the solutions that are intractable in practice. We present theoretical backing for our proposed IRL method's applicability for both discrete and continuous controls, empirically validating our performance on a variety of tasks.