Difftune：通过自动差异进行自动调节

论文标题

Difftune：通过自动差异进行自动调节

DiffTune: Auto-Tuning through Auto-Differentiation

论文作者

Cheng, Sheng, Kim, Minkyung, Song, Lin, Yang, Chengyu, Jin, Yiquan, Wang, Shenlong, Hovakimyan, Naira

论文摘要

机器人在高级任务中的性能取决于其低级控制器的质量，这需要微调。但是，本质上的非线性动力学和控制器使调谐在手工完成时具有挑战性的任务。在本文中，我们提出了DiFftune，这是一个新型，基于梯度的自动调整框架。我们将控制器调整作为参数优化问题。我们的方法将动态系统和控制器作为计算图展开，并通过基于梯度的优化更新控制器参数。使用灵敏度传播获得梯度，这是调整物理系统而不是模拟对应物时梯度计算的唯一方法。此外，我们使用$ \ MATHCAL {L} _1 $自适应控制来补偿不确定性（不可避免地存在于物理系统中），以使梯度不会被未经模型的不确定性所偏置。我们在挑战性的模拟环境中验证了Dubin汽车上的Difftune和四型。与最先进的自动调整方法相比，DiFftune由于其有效使用了系统的一阶信息，以更有效的方式实现了最佳性能。针对四型的非线性控制器调谐的实验显示出令人鼓舞的结果，在12维控制器参数空间中，Dixftune仅在10个试验中仅10个试验就可以实现3.5倍跟踪误差降低。

The performance of robots in high-level tasks depends on the quality of their lower-level controller, which requires fine-tuning. However, the intrinsically nonlinear dynamics and controllers make tuning a challenging task when it is done by hand. In this paper, we present DiffTune, a novel, gradient-based automatic tuning framework. We formulate the controller tuning as a parameter optimization problem. Our method unrolls the dynamical system and controller as a computational graph and updates the controller parameters through gradient-based optimization. The gradient is obtained using sensitivity propagation, which is the only method for gradient computation when tuning for a physical system instead of its simulated counterpart. Furthermore, we use $\mathcal{L}_1$ adaptive control to compensate for the uncertainties (that unavoidably exist in a physical system) such that the gradient is not biased by the unmodelled uncertainties. We validate the DiffTune on a Dubin's car and a quadrotor in challenging simulation environments. In comparison with state-of-the-art auto-tuning methods, DiffTune achieves the best performance in a more efficient manner owing to its effective usage of the first-order information of the system. Experiments on tuning a nonlinear controller for quadrotor show promising results, where DiffTune achieves 3.5x tracking error reduction on an aggressive trajectory in only 10 trials over a 12-dimensional controller parameter space.

下载PDF全文

下载文献需遵守相关版权规定

论文标题