论文标题
强大的加固学习:线性二次调节中的案例研究
Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation
论文作者
论文摘要
本文研究了增强学习算法与学习过程中错误的鲁棒性。具体而言,我们重新审视离散时间线性二次调节(LQR)的基准问题,并研究了长期存在的开放问题:在什么条件下,从动态系统的角度来看,政策迭代方法在什么条件下稳健稳定?使用先进的稳定性导致控制理论的结果表明,LQR的政策迭代对学习过程中的微小错误本质上是强大的,并且享受小型扰动输入到国家稳定性:每次迭代中的错误都有界限且小时,政策迭代算法的解决方案也会限制在最佳的范围内。作为一种应用,当系统动力学受到加性随机性干扰时,提出了一种新型的LQR问题的新型脱离乐观的最小二乘政策迭代。强大的增强学习中提出的新结果通过数值示例验证。
This paper studies the robustness of reinforcement learning algorithms to errors in the learning process. Specifically, we revisit the benchmark problem of discrete-time linear quadratic regulation (LQR) and study the long-standing open question: Under what conditions is the policy iteration method robustly stable from a dynamical systems perspective? Using advanced stability results in control theory, it is shown that policy iteration for LQR is inherently robust to small errors in the learning process and enjoys small-disturbance input-to-state stability: whenever the error in each iteration is bounded and small, the solutions of the policy iteration algorithm are also bounded, and, moreover, enter and stay in a small neighbourhood of the optimal LQR solution. As an application, a novel off-policy optimistic least-squares policy iteration for the LQR problem is proposed, when the system dynamics are subjected to additive stochastic disturbances. The proposed new results in robust reinforcement learning are validated by a numerical example.