Lyapunov设计，用于强大而有效的机器人增强学习

论文标题

Lyapunov设计，用于强大而有效的机器人增强学习

Lyapunov Design for Robust and Efficient Robotic Reinforcement Learning

论文作者

Westenbroek, Tyler, Castaneda, Fernando, Agrawal, Ayush, Sastry, Shankar, Sreenath, Koushil

论文摘要

加固学习（RL）文献的最新进展使机器人主义者能够在模拟环境中自动训练复杂的政策。但是，由于这些方法的样本复杂性较差，使用现实世界数据解决RL问题仍然是一个具有挑战性的问题。本文介绍了一种新颖的成本整形方法，旨在减少学习稳定控制器所需的样本数量。该方法添加了一个涉及控制Lyapunov函数（CLF）的术语 - 基于模型的控制文献的“能量样”功能 - 到典型的成本配方。理论上的结果表明，新的成本会导致使用较小的折现因子时稳定控制器，这是众所周知的，以降低样本复杂性。此外，通过确保即使是高度亚最佳的策略也将稳定系统，添加CLF项“鲁棒”搜索稳定控制器。我们通过两个硬件示例演示了我们的方法，在其中我们学习了一个cartpole的稳定控制器和仅使用几秒钟和几分钟的微调数据的A1稳定控制器。此外，模拟基准研究表明，与标准成本设计相比，通过优化我们提出的成本来获得稳定策略的数据数量较小。

Recent advances in the reinforcement learning (RL) literature have enabled roboticists to automatically train complex policies in simulated environments. However, due to the poor sample complexity of these methods, solving RL problems using real-world data remains a challenging problem. This paper introduces a novel cost-shaping method which aims to reduce the number of samples needed to learn a stabilizing controller. The method adds a term involving a Control Lyapunov Function (CLF) -- an `energy-like' function from the model-based control literature -- to typical cost formulations. Theoretical results demonstrate the new costs lead to stabilizing controllers when smaller discount factors are used, which is well-known to reduce sample complexity. Moreover, the addition of the CLF term `robustifies' the search for a stabilizing controller by ensuring that even highly sub-optimal polices will stabilize the system. We demonstrate our approach with two hardware examples where we learn stabilizing controllers for a cartpole and an A1 quadruped with only seconds and a few minutes of fine-tuning data, respectively. Furthermore, simulation benchmark studies show that obtaining stabilizing policies by optimizing our proposed costs requires orders of magnitude less data compared to standard cost designs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题