对平滑的贝尔曼错误嵌入的急剧分析

论文标题

对平滑的贝尔曼错误嵌入的急剧分析

Sharp Analysis of Smoothed Bellman Error Embedding

论文作者

Touati, Ahmed, Vincent, Pascal

论文摘要

\ textIt {平滑的贝尔曼错误嵌入}算法〜\ citep {dai2018sbeed}（称为sbeed）被认为是一种具有通用非线性函数近似的可证明是一种可证明的收敛强化学习算法。它已通过神经网络成功实施，并取得了强大的经验结果。在这项工作中，我们研究了SBEED在批处理模式增强学习中的理论行为。我们证明了近乎最佳的性能保证，该保证取决于使用的功能类别的表示能力以及分配转移的紧密概念。根据对计划范围和样本量的依赖性，我们的结果在〜\ citet {dai2018sbeed}中的先前保证得到改善。我们的分析基于〜\ citet {Xie2020}的最新工作，该工作研究了相关算法MSBO，可以将其解释为SBEED的\ textit {non-Smooth}对应物。

The \textit{Smoothed Bellman Error Embedding} algorithm~\citep{dai2018sbeed}, known as SBEED, was proposed as a provably convergent reinforcement learning algorithm with general nonlinear function approximation. It has been successfully implemented with neural networks and achieved strong empirical results. In this work, we study the theoretical behavior of SBEED in batch-mode reinforcement learning. We prove a near-optimal performance guarantee that depends on the representation power of the used function classes and a tight notion of the distribution shift. Our results improve upon prior guarantees for SBEED in ~\citet{dai2018sbeed} in terms of the dependence on the planning horizon and on the sample size. Our analysis builds on the recent work of ~\citet{Xie2020} which studies a related algorithm MSBO, that could be interpreted as a \textit{non-smooth} counterpart of SBEED.

下载PDF全文

下载文献需遵守相关版权规定

论文标题