实时计划的样品效率跨凝结法

论文标题

实时计划的样品效率跨凝结法

Sample-efficient Cross-Entropy Method for Real-time Planning

论文作者

Pinneri, Cristina, Sawant, Shambhuraj, Blaes, Sebastian, Achterhold, Jan, Stueckler, Joerg, Rolinek, Michal, Martius, Georg

论文摘要

用于基于模型的增强学习的轨迹优化器，例如跨凝结方法（CEM），即使在高维控制任务和稀疏回报环境中，也可以产生令人信服的结果。但是，他们的抽样效率低下会阻止它们用于实时计划和控制。我们提出了用于快速计划的CEM算法的改进版本，其中包括时间相关的动作和记忆，需要减少2.7-22倍的样本，并在高维控制问题中产生1.2-10x的性能提高。

Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency prevents them from being used for real-time planning and control. We propose an improved version of the CEM algorithm for fast planning, with novel additions including temporally-correlated actions and memory, requiring 2.7-22x less samples and yielding a performance increase of 1.2-10x in high-dimensional control problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题