论文标题
带有自我改善模拟器的POMDP的在线计划
Online Planning in POMDPs with Self-Improving Simulators
论文作者
论文摘要
当时间预算有限时,我们如何在庞大而复杂的环境中有效地计划?鉴于环境的原始模拟器可能在计算上非常苛刻,我们建议在线学习一个大约但更快的模拟器,随着时间的流逝而改善。为了在近似模拟器学习时可靠,有效地计划,我们基于测量近似模拟器的准确性的统计量,开发了一种适应性地决定用于每个模拟的模拟器的方法。这使我们可以使用近似模拟器在当前上下文中足够准确时替换原始模拟器以更快的模拟,从而使模拟速度和准确性交换。两个大域中的实验结果表明,当与POMCP集成时,我们的方法可以随着时间的推移而提高效率。
How can we plan efficiently in a large and complex environment when the time budget is limited? Given the original simulator of the environment, which may be computationally very demanding, we propose to learn online an approximate but much faster simulator that improves over time. To plan reliably and efficiently while the approximate simulator is learning, we develop a method that adaptively decides which simulator to use for every simulation, based on a statistic that measures the accuracy of the approximate simulator. This allows us to use the approximate simulator to replace the original simulator for faster simulations when it is accurate enough under the current context, thus trading off simulation speed and accuracy. Experimental results in two large domains show that when integrated with POMCP, our approach allows to plan with improving efficiency over time.