使用基于模型的梯度加速无模型的策略优化：复合优化的观点

论文标题

使用基于模型的梯度加速无模型的策略优化：复合优化的观点

Accelerating Model-Free Policy Optimization Using Model-Based Gradient: A Composite Optimization Perspective

论文作者

Li, Yansong, Han, Shuo

论文摘要

我们开发了一种算法，该算法结合了基于模型的和无模型的方法，用于解决非线性最佳控制问题和二次成本，其中系统模型由带有小型添加剂非线性扰动的线性状态空间模型给出。我们将成本分解为两个函数的总和，一个具有从近似线性模型获得的显式形式，另一个是代表未知建模误差的黑框模型。分解使我们能够将问题作为复合优化问题提出。为了解决优化问题，我们的算法使用从近似线性模型获得的梯度执行梯度下降，直到回溯线搜索失败，然后将基于模型的梯度与从无模型算法获得的确切梯度进行比较。然后使用模型梯度和确切梯度之间的差异来补偿未来的基于梯度的更新。在理论和实践中，与传统的无模型方法相比，我们的算法可减少功能评估的数量。

We develop an algorithm that combines model-based and model-free methods for solving a nonlinear optimal control problem with a quadratic cost in which the system model is given by a linear state-space model with a small additive nonlinear perturbation. We decompose the cost into a sum of two functions, one having an explicit form obtained from the approximate linear model, the other being a black-box model representing the unknown modeling error. The decomposition allows us to formulate the problem as a composite optimization problem. To solve the optimization problem, our algorithm performs gradient descent using the gradient obtained from the approximate linear model until backtracking line search fails, upon which the model-based gradient is compared with the exact gradient obtained from a model-free algorithm. The difference between the model gradient and the exact gradient is then used for compensating future gradient-based updates. Our algorithm is shown to decrease the number of function evaluations compared with traditional model-free methods both in theory and in practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题