政策感知的策略梯度方法的模型学习

论文标题

政策感知的策略梯度方法的模型学习

Policy-Aware Model Learning for Policy Gradient Methods

论文作者

Abachi, Romina, Ghavamzadeh, Mohammad, Farahmand, Amir-massoud

论文摘要

本文考虑了学习基于模型的强化学习（MBRL）的模型的问题。我们研究了MBRL算法的计划模块如何使用该模型，并建议模型学习模块应合并计划者使用模型的方式。这与传统的模型学习方法相反，例如基于最大似然估计的方法，即在不明确考虑模型和计划者的相互作用的情况下学习环境的预测模型。我们专注于计划算法的策略梯度类型，并为模型学习提供了新的损失功能，该功能结合了计划者的使用方式。我们称这种方法为政策感知模型学习（PAML）。我们理论上分析了基于通用模型的策略梯度算法，并为优化的策略提供了融合保证。我们还在某些基准问题上对PAML进行了经验评估，显示出令人鼓舞的结果。

This paper considers the problem of learning a model in model-based reinforcement learning (MBRL). We examine how the planning module of an MBRL algorithm uses the model, and propose that the model learning module should incorporate the way the planner is going to use the model. This is in contrast to conventional model learning approaches, such as those based on maximum likelihood estimate, that learn a predictive model of the environment without explicitly considering the interaction of the model and the planner. We focus on policy gradient type of planning algorithms and derive new loss functions for model learning that incorporate how the planner uses the model. We call this approach Policy-Aware Model Learning (PAML). We theoretically analyze a generic model-based policy gradient algorithm and provide a convergence guarantee for the optimized policy. We also empirically evaluate PAML on some benchmark problems, showing promising results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题