论文标题
在有限模型容量下选择性DYNA风格的计划
Selective Dyna-style Planning Under Limited Model Capacity
论文作者
论文摘要
在基于模型的强化学习中,通过不完美的环境模型进行计划有可能损害学习进步。但是,即使模型不完美,它仍然可能包含对计划有用的信息。在本文中,我们研究了选择性地使用不完美模型的想法。代理应计划在模型会有所帮助的地方空间的某些部分,但不要使用该模型有害的模型。有效的选择性计划机制需要估计预测不确定性,这是由于不确定性,参数不确定性和模型不足而产生的,以及其他来源。先前的工作集中在选择性计划的参数不确定性上。在这项工作中,我们强调了模型不足的重要性。我们表明,异性回归可以信号由模型不足引起的预测性不确定性,该模型不足,这是与针对参数不确定性设计的方法所检测到的,这表明要考虑参数不确定性和模型不足可能是有效选择性计划的更有希望的方向,而不是隔离。
In model-based reinforcement learning, planning with an imperfect model of the environment has the potential to harm learning progress. But even when a model is imperfect, it may still contain information that is useful for planning. In this paper, we investigate the idea of using an imperfect model selectively. The agent should plan in parts of the state space where the model would be helpful but refrain from using the model where it would be harmful. An effective selective planning mechanism requires estimating predictive uncertainty, which arises out of aleatoric uncertainty, parameter uncertainty, and model inadequacy, among other sources. Prior work has focused on parameter uncertainty for selective planning. In this work, we emphasize the importance of model inadequacy. We show that heteroscedastic regression can signal predictive uncertainty arising from model inadequacy that is complementary to that which is detected by methods designed for parameter uncertainty, indicating that considering both parameter uncertainty and model inadequacy may be a more promising direction for effective selective planning than either in isolation.