有条不紊的建议收集和重复使用深度强化学习

论文标题

有条不紊的建议收集和重复使用深度强化学习

Methodical Advice Collection and Reuse in Deep Reinforcement Learning

论文作者

Sahir, İlhan, Ercüment, Das, Srijita, Taylor, Matthew E.

论文摘要

增强学习（RL）在通过使用深层神经网络来解决许多具有挑战性的任务方面取得了巨大的成功。尽管将深度学习用于RL带来了巨大的代表力，但它也会引起众所周知的样本信息问题。这意味着算法是渴望数据的，需要数百万培训样本才能融合到足够的政策。解决这个问题的一种方法是在教师学生框架中使用咨询服务的行动，在师生的框架中，知识渊博的老师提供行动建议来帮助学生。这项工作考虑了如何更好地利用学生何时应寻求建议的不确定性，以及学生是否可以为老师提供较少的建议。学生可以决定何时不确定或何时何时及其教师模型何时不确定。除了这项研究外，本文还介绍了一种新方法，以计算使用二级神经网络的深度RL药物的不确定性。我们的经验结果表明，使用双重不确定性来推动建议收集和重复使用可以提高几个Atari游戏的学习表现。

Reinforcement learning (RL) has shown great success in solving many challenging tasks via use of deep neural networks. Although using deep learning for RL brings immense representational power, it also causes a well-known sample-inefficiency problem. This means that the algorithms are data-hungry and require millions of training samples to converge to an adequate policy. One way to combat this issue is to use action advising in a teacher-student framework, where a knowledgeable teacher provides action advice to help the student. This work considers how to better leverage uncertainties about when a student should ask for advice and if the student can model the teacher to ask for less advice. The student could decide to ask for advice when it is uncertain or when both it and its model of the teacher are uncertain. In addition to this investigation, this paper introduces a new method to compute uncertainty for a deep RL agent using a secondary neural network. Our empirical results show that using dual uncertainties to drive advice collection and reuse may improve learning performance across several Atari games.

下载PDF全文

下载文献需遵守相关版权规定

论文标题