论文标题

最佳的甲骨文不平等现象,用于求解投影的定点方程

Optimal oracle inequalities for solving projected fixed-point equations

论文作者

Mou, Wenlong, Pananjady, Ashwin, Wainwright, Martin J.

论文摘要

希尔伯特空间中的线性固定点方程在各种环境中都出现,包括增强学习以及用于求解差分方程的计算方法。我们研究了使用随机观测值集合来计算近似解决方案的方法,通过搜索Hilbert空间的已知低维子空间来计算近似解决方案。首先,我们证明了实例依赖性上限在利用polyak-rppert平均的线性随机近似方案的均方误差上。该界限由两个术语组成:一个近似误差项,具有实例依赖性近似因子,以及一个统计误差项,该项捕获噪声特定于实例的复杂性,当投影到低维子空间上时。使用信息理论方法,我们还建立了下界,表明这两个术语在实例依赖性意义上再次无法改进。我们表征的具体后果是,该问题的最佳近似因素可能比通用常数大得多。我们展示了我们的结果如何精确地表征具有线性函数近似策略评估问题的一类时间差异学习方法的误差,从而确立了它们的最佳性。

Linear fixed point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper bound on the mean-squared error for a linear stochastic approximation scheme that exploits Polyak--Ruppert averaging. This bound consists of two terms: an approximation error term with an instance-dependent approximation factor, and a statistical error term that captures the instance-specific complexity of the noise when projected onto the low-dimensional subspace. Using information theoretic methods, we also establish lower bounds showing that both of these terms cannot be improved, again in an instance-dependent sense. A concrete consequence of our characterization is that the optimal approximation factor in this problem can be much larger than a universal constant. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation, establishing their optimality.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源