论文标题

连续时间混合$ \ MATHCAL {H} _2/\ MATHCAL {H} _ \ infty $随机控制

Robust Policy Optimization in Continuous-time Mixed $\mathcal{H}_2/\mathcal{H}_\infty$ Stochastic Control

论文作者

Cui, Leilei, Molu, Lekan

论文摘要

Following the recent resurgence in establishing linear control theoretic benchmarks for reinforcement leaning (RL)-based policy optimization (PO) for complex dynamical systems with continuous state and action spaces, an optimal control problem for a continuous-time infinite-dimensional linear stochastic system possessing additive Brownian motion is optimized on a cost that is an exponent of the quadratic form of the state, input, and disturbance terms.我们为基于RL的随机PO制定了一种基于模型和无模型的算法。对于基于模型的算法,我们建立了严格的合并保证。对于基于采样的算法,对于从相位空间散发出的轨迹弧线,我们发现汉密尔顿 - 雅各比贝尔曼方程参数化轨迹成本 - 导致离散时间(基于州和州的)采样方案,并伴随着未知的非线性动力学,并具有连续的时间策略迭代。需要规避对已知动态操作员的需求,我们达到了加强PO算法(通过策略迭代),在$ \ \ \ \ \ \ \ \ mathcal {h} _2 $ norm上的上限最小化(以保证稳定性),并通过将噪声定义$ h $ h y n n n prokity ungried ungure(确保稳定性公制)来实现。严格的鲁棒性分析是在投入到国家稳定性形式主义中规定的。我们的分析和贡献由许多以添加剂维纳过程为特征的自然系统来区分,这与动态游戏设置中的随机差分微积分相提并论。

Following the recent resurgence in establishing linear control theoretic benchmarks for reinforcement leaning (RL)-based policy optimization (PO) for complex dynamical systems with continuous state and action spaces, an optimal control problem for a continuous-time infinite-dimensional linear stochastic system possessing additive Brownian motion is optimized on a cost that is an exponent of the quadratic form of the state, input, and disturbance terms. We lay out a model-based and model-free algorithm for RL-based stochastic PO. For the model-based algorithm, we establish rigorous convergence guarantees. For the sampling-based algorithm, over trajectory arcs that emanate from the phase space, we find that the Hamilton-Jacobi Bellman equation parameterizes trajectory costs -- resulting in a discrete-time (input and state-based) sampling scheme accompanied by unknown nonlinear dynamics with continuous-time policy iterates. The need for known dynamics operators is circumvented and we arrive at a reinforced PO algorithm (via policy iteration) where an upper bound on the $\mathcal{H}_2$ norm is minimized (to guarantee stability) and a robustness metric is enforced by maximizing the cost with respect to a controller that includes the level of noise attenuation specified by the system's $H_\infty$ norm. Rigorous robustness analyses is prescribed in an input-to-state stability formalism. Our analyses and contributions are distinguished by many natural systems characterized by additive Wiener process, amenable to Îto's stochastic differential calculus in dynamic game settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源