评估基于模型的强化学习方法的适应性

论文标题

评估基于模型的强化学习方法的适应性

Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods

论文作者

Wan, Yi, Rahimi-Kalahroudi, Ali, Rajendran, Janarthanan, Momennejad, Ida, Chandar, Sarath, van Seijen, Harm

论文摘要

近年来，已经引入了越来越多的基于模型的强化学习（RL）方法。鉴于其许多潜在的好处，例如更高的样本效率和快速适应环境变化的潜力，对基于深层模型的RL的兴趣并不奇怪。但是，我们证明，使用最近引入的本地变化适应（LOCA）设置的改进版本，众所周知的基于模型的方法（如Planet和Dreamerv2）在适应本地环境变化的能力方面表现不佳。结合先前的工作，对其他基于模型的方法Muzero进行了类似的观察，似乎出现了一种趋势，这表明当前基于深层模型的方法具有严重的局限性。我们通过确定损害适应性行为的元素并将这些元素与经常在基于Deep Model的RL中经常使用的基础技术联系起来，深入研究了这种绩效不佳的原因。在线性函数近似的情况下，我们通过证明修改的线性DYNA的修改版本实现了对局部变化的有效适应，从而验证了这些见解。此外，我们通过实验非线性版本的Dyna来提供详细的见解，以了解构建基于自适应非线性模型的方法的挑战。

In recent years, a growing number of deep model-based reinforcement learning (RL) methods have been introduced. The interest in deep model-based RL is not surprising, given its many potential benefits, such as higher sample efficiency and the potential for fast adaption to changes in the environment. However, we demonstrate, using an improved version of the recently introduced Local Change Adaptation (LoCA) setup, that well-known model-based methods such as PlaNet and DreamerV2 perform poorly in their ability to adapt to local environmental changes. Combined with prior work that made a similar observation about the other popular model-based method, MuZero, a trend appears to emerge, suggesting that current deep model-based methods have serious limitations. We dive deeper into the causes of this poor performance, by identifying elements that hurt adaptive behavior and linking these to underlying techniques frequently used in deep model-based RL. We empirically validate these insights in the case of linear function approximation by demonstrating that a modified version of linear Dyna achieves effective adaptation to local changes. Furthermore, we provide detailed insights into the challenges of building an adaptive nonlinear model-based method, by experimenting with a nonlinear version of Dyna.

下载PDF全文

下载文献需遵守相关版权规定

论文标题