在动态环境中的遗憾 - 最佳控制

论文标题

在动态环境中的遗憾 - 最佳控制

Regret-optimal control in dynamic environments

论文作者

Goel, Gautam, Hassibi, Babak

论文摘要

我们从遗憾最小化的角度考虑线性时变动力系统中的控制。与该领域的大多数先前工作不同，我们专注于设计在线控制器的问题，该问题最大程度地减少了对最佳动态控制动作的最佳动态控制序列（动态遗憾），而不是某些特定类别的控制器（静态遗憾）中最佳的固定控制器。当环境随时间变化而没有单个控制器在整个时间范围内都能达到良好的性能时，该公式将很有吸引力。我们通过新颖的减少到$ h _ {\ infty} $控制来得出遗憾 - 最佳控制器的状态空间结构，并就干扰能量的遗憾呈现了紧密的数据依赖性限制。我们的结果很容易扩展到模型预测的设置，在该设置中，控制器可以预测未来的干扰以及控制器仅在固定延迟后影响系统动态的设置。我们提出了数值实验，这些实验表明，我们的遗憾控制器在$ H_2 $ - 局部的性能与$ h _ {\ infty} $ - 跨随机和对抗性环境之间的最佳控制器之间进行了插值。

We consider control in linear time-varying dynamical systems from the perspective of regret minimization. Unlike most prior work in this area, we focus on the problem of designing an online controller which minimizes regret against the best dynamic sequence of control actions selected in hindsight (dynamic regret), instead of the best fixed controller in some specific class of controllers (static regret). This formulation is attractive when the environment changes over time and no single controller achieves good performance over the entire time horizon. We derive the state-space structure of the regret-optimal controller via a novel reduction to $H_{\infty}$ control and present a tight data-dependent bound on its regret in terms of the energy of the disturbance. Our results easily extend to the model-predictive setting where the controller can anticipate future disturbances and to settings where the controller only affects the system dynamics after a fixed delay. We present numerical experiments which show that our regret-optimal controller interpolates between the performance of the $H_2$-optimal and $H_{\infty}$-optimal controllers across stochastic and adversarial environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题