在平均成本最佳随机控制中，对不正确模型和数据驱动的学习的鲁棒性

论文标题

在平均成本最佳随机控制中，对不正确模型和数据驱动的学习的鲁棒性

Robustness to Incorrect Models and Data-Driven Learning in Average-Cost Optimal Stochastic Control

论文作者

Kara, Ali Devran, Raginsky, Maxim, Yuksel, Serdar

论文摘要

我们研究了（控制）过渡内核的无限 - 水平平均预期成本问题的连续性和鲁棒性特性，以及这些结果的应用在适用于实际系统的近似模型的控制策略的鲁棒性问题上。我们表明，文献中为折扣成本问题提供的足够条件通常不足以确保平均成本问题的鲁棒性。但是，我们表明，在受控过渡内核模型的融合中，平均最佳成本是连续的，其中模型的收敛性需要（i）状态和动作中的连续弱收敛性，以及（ii）每个固定状态变量的操作中连续的固定收敛，除了任何一个均匀的细感性或某些正常的条件。我们确定，由于不正确的模型在规定的收敛标准下接近真实的模型时，由于应用于错误估计的模型而设计的控制策略引起的不匹配误差降低到零。我们的发现在文献中显着放松了相关的研究，这些研究主要考虑了更严格的总变异融合标准。对通过经验数据估计的模型的鲁棒性应用程序（在几乎肯定较弱的收敛标准通常成立，但没有研究标准），并确定了对数据驱动学习的渐近鲁棒性条件。

We study continuity and robustness properties of infinite-horizon average expected cost problems with respect to (controlled) transition kernels, and applications of these results to the problem of robustness of control policies designed for approximate models applied to actual systems. We show that sufficient conditions presented in the literature for discounted-cost problems are in general not sufficient to ensure robustness for average-cost problems. However, we show that the average optimal cost is continuous in the convergences of controlled transition kernel models where convergence of models entails (i) continuous weak convergence in state and actions, and (ii) continuous setwise convergence in the actions for every fixed state variable, in addition to either uniform ergodicity or some regularity conditions. We establish that the mismatch error due to the application of a control policy designed for an incorrectly estimated model to the true model decreases to zero as the incorrect model approaches the true model under the stated convergence criteria. Our findings significantly relax related studies in the literature which have primarily considered the more restrictive total variation convergence criteria. Applications to robustness to models estimated through empirical data (where almost sure weak convergence criterion typically holds, but stronger criteria do not) are studied and conditions for asymptotic robustness to data-driven learning are established.

下载PDF全文

下载文献需遵守相关版权规定

论文标题