通过超级核武器的持续基于模型的增强学习

论文标题

通过超级核武器的持续基于模型的增强学习

Continual Model-Based Reinforcement Learning with Hypernetworks

论文作者

Huang, Yizhou, Xie, Kevin, Bharadhwaj, Homanga, Shkurti, Florian

论文摘要

基于模型的增强学习（MBRL）和模型预测性控制（MPC）的有效计划取决于学习动力学模型的准确性。在MBRL和MPC的许多实例中，该模型被认为是固定的，并在从头开始的状态过渡经验中定期重新训练。这意味着训练动力学模型所需的时间以及计划执行之间所需的暂停 - 随着收集的体验的大小线性增长。我们认为，对于终身机器人学习并提出了HyperCRL的速度太慢了，这种方法不断地使用任务条件性超网络以一系列任务学习遇到的动态。我们的方法具有三个主要属性：首先，它包括动态学习会话，这些课程不会重新访问以前任务的培训数据，因此它只需要存储最新的固定尺寸部分国家过渡经验；其次，它使用固定容量的超网络代表非平稳和任务感知动力学。第三，它的表现优于依赖固定容量网络的现有持续学习替代方案，并且与记住过去经验中越来越多的基线的竞争性。我们表明，HyperCRL在机器人运动和操纵场景中的连续基于模型的增强学习中有效，例如涉及推开和开门的任务。我们带有视频的项目网站在此链接上https://rvl.cs.toronto.edu/blog/2020/hyhypercrl

Effective planning in model-based reinforcement learning (MBRL) and model-predictive control (MPC) relies on the accuracy of the learned dynamics model. In many instances of MBRL and MPC, this model is assumed to be stationary and is periodically re-trained from scratch on state transition experience collected from the beginning of environment interactions. This implies that the time required to train the dynamics model - and the pause required between plan executions - grows linearly with the size of the collected experience. We argue that this is too slow for lifelong robot learning and propose HyperCRL, a method that continually learns the encountered dynamics in a sequence of tasks using task-conditional hypernetworks. Our method has three main attributes: first, it includes dynamics learning sessions that do not revisit training data from previous tasks, so it only needs to store the most recent fixed-size portion of the state transition experience; second, it uses fixed-capacity hypernetworks to represent non-stationary and task-aware dynamics; third, it outperforms existing continual learning alternatives that rely on fixed-capacity networks, and does competitively with baselines that remember an ever increasing coreset of past experience. We show that HyperCRL is effective in continual model-based reinforcement learning in robot locomotion and manipulation scenarios, such as tasks involving pushing and door opening. Our project website with videos is at this link https://rvl.cs.toronto.edu/blog/2020/hypercrl

下载PDF全文

下载文献需遵守相关版权规定

论文标题