通过梯度匹配的域知识整合，以进行样品有效的增强学习

论文标题

通过梯度匹配的域知识整合，以进行样品有效的增强学习

Domain Knowledge Integration By Gradient Matching For Sample-Efficient Reinforcement Learning

论文作者

Chadha, Parth

论文摘要

无模型的深钢筋学习（RL）代理可以直接从与黑箱环境的重复互动中学习有效的政策。但是，实际上，这些算法通常需要大量的培训经验来学习和概括。此外，经典的无模型学习忽略了状态过渡元组中包含的域信息。另一方面，基于模型的RL试图从经验中学习一个环境模型，并且要高效的样本效率，但由于不完善的动态模型，造成了显着较大的渐近偏差。在本文中，我们提出了一种梯度匹配算法，以利用来自动力学预测器的目标斜率信息来提高样品效率，以帮助无模型学习者。我们通过提出了一种将基于模型的学习者与抽象低维空间中的无模型组件相匹配的技术来证明这一点，并通过实验结果验证提出的技术，以证明这种方法的功效。

Model-free deep reinforcement learning (RL) agents can learn an effective policy directly from repeated interactions with a black-box environment. However in practice, the algorithms often require large amounts of training experience to learn and generalize well. In addition, classic model-free learning ignores the domain information contained in the state transition tuples. Model-based RL, on the other hand, attempts to learn a model of the environment from experience and is substantially more sample efficient, but suffers from significantly large asymptotic bias owing to the imperfect dynamics model. In this paper, we propose a gradient matching algorithm to improve sample efficiency by utilizing target slope information from the dynamics predictor to aid the model-free learner. We demonstrate this by presenting a technique for matching the gradient information from the model-based learner with the model-free component in an abstract low-dimensional space and validate the proposed technique through experimental results that demonstrate the efficacy of this approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题