通过不确定性估计来采样有效的深入学习

论文标题

通过不确定性估计来采样有效的深入学习

Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation

论文作者

Mai, Vincent, Mani, Kaustubh, Paull, Liam

论文摘要

在无模型的深钢筋学习（RL）算法中，使用嘈杂的价值估计来监督政策评估和优化对样本效率有害。由于这种噪声是异质的，因此可以在优化过程中使用基于不确定性的权重来减轻其效果。以前的方法依赖于采样的合奏，这些合奏并不能捕获不确定性的所有方面。我们对RL中发生的嘈杂监督的不确定性来源提供了系统的分析，并引入了反相差RL，这是一个结合了概率集合和批处理反向差异加权的贝叶斯框架。我们提出了一种方法，其中两种互补的不确定性估计方法可以解释Q值和环境随机性，以更好地减轻嘈杂监督的负面影响。我们的结果在离散和连续控制任务上的样本效率方面显示出显着提高。

In model-free deep reinforcement learning (RL) algorithms, using noisy value estimates to supervise policy evaluation and optimization is detrimental to the sample efficiency. As this noise is heteroscedastic, its effects can be mitigated using uncertainty-based weights in the optimization process. Previous methods rely on sampled ensembles, which do not capture all aspects of uncertainty. We provide a systematic analysis of the sources of uncertainty in the noisy supervision that occurs in RL, and introduce inverse-variance RL, a Bayesian framework which combines probabilistic ensembles and Batch Inverse Variance weighting. We propose a method whereby two complementary uncertainty estimation methods account for both the Q-value and the environment stochasticity to better mitigate the negative impacts of noisy supervision. Our results show significant improvement in terms of sample efficiency on discrete and continuous control tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题