论文标题
细粒依赖分布的学习曲线
Fine-Grained Distribution-Dependent Learning Curves
论文作者
论文摘要
学习曲线将学习算法的预期误差绘制为从目标分布中收到的标记样品数量的函数。它们被广泛用作算法的性能的量度,但是经典的PAC学习理论无法解释其行为。 正如Antos and Lugosi(1996,1998)所观察到的,经典的“无午餐”下限仅追踪上层包络,高于特定目标分布的所有学习曲线。对于具有VC尺寸$ d $的概念类,经典界限损失如$ d/n $,但是对于\ emph {every}特定分布的学习曲线可能会呈指数型。在这种情况下,对于每个$ n $,都有需要$ d/n $样本的不同“硬”分发。 Antos and Lugosi询问哪种概念班则接受了“强极下限” - $ d'/n $的下限为无限多$ n $的固定分配所容纳。 我们通过引入一个称为VCL的组合维度来解决这个问题,该组合尺寸为VCL,该尺寸表征了最佳的$ d'U $,$ d'/n $是强大的minimax下限。我们的特征增强了Bousquet,Hanneke,Moran,Van Handel和Yehudayoff(2021)的下限,并通过证明对于具有有限VCL的课程,可以将学习率分解为仅取决于假设类别的线性组成部分和指数分配的线性组件,从而完善了他们的学习曲线理论。作为推论,我们恢复了Antos and Lugosi(1996,1998)的下限,以$ \ Mathbb {r}^d $中的半个空间。 最后,为了对我们的工作以及与传统PAC学习界的比较提供另一个观点,我们还以一种更接近PAC环境的语言提出了一种替代表述。
Learning curves plot the expected error of a learning algorithm as a function of the number of labeled samples it receives from a target distribution. They are widely used as a measure of an algorithm's performance, but classic PAC learning theory cannot explain their behavior. As observed by Antos and Lugosi (1996 , 1998), the classic `No Free Lunch' lower bounds only trace the upper envelope above all learning curves of specific target distributions. For a concept class with VC dimension $d$ the classic bound decays like $d/n$, yet it is possible that the learning curve for \emph{every} specific distribution decays exponentially. In this case, for each $n$ there exists a different `hard' distribution requiring $d/n$ samples. Antos and Lugosi asked which concept classes admit a `strong minimax lower bound' -- a lower bound of $d'/n$ that holds for a fixed distribution for infinitely many $n$. We solve this problem in a principled manner, by introducing a combinatorial dimension called VCL that characterizes the best $d'$ for which $d'/n$ is a strong minimax lower bound. Our characterization strengthens the lower bounds of Bousquet, Hanneke, Moran, van Handel, and Yehudayoff (2021), and it refines their theory of learning curves, by showing that for classes with finite VCL the learning rate can be decomposed into a linear component that depends only on the hypothesis class and an exponential component that depends also on the target distribution. As a corollary, we recover the lower bound of Antos and Lugosi (1996 , 1998) for half-spaces in $\mathbb{R}^d$. Finally, to provide another viewpoint on our work and how it compares to traditional PAC learning bounds, we also present an alternative formulation of our results in a language that is closer to the PAC setting.