论文标题
与Massart噪声学习半空间的近乎最佳的统计查询硬度
Near-Optimal Statistical Query Hardness of Learning Halfspaces with Massart Noise
论文作者
论文摘要
我们研究了Massart噪声的PAC学习半空间的问题。给定的给定标记的样品$(x,y)从$ \ mathbb {r}^{d} \ times \ {\ pm 1 \} $上的给定标记的$ d $,以至于示例上的边际$ d_x $是任意的,示例$ x $的标签$ x $由flogs $ x $ x y $ profiptive the flof target profiption frol a fle profiption fle profiption a fle profiption fle。 η\ leq 1/2 $,目标是计算一个错误分类错误的假设。最著名的$ \ mathrm {poly}(d,1/ε)$ - 有关此问题的时间算法达到了$η+ε$的错误,这可能与$ \ mathrm {opt}+ε$的最佳结合远距离,其中$ \ \ \ \ \ \ m m mathrm {opt} = \ mathbf {e} $}众所周知,在统计查询模型中实现$ \ mathrm {opt}+o(1)$错误需要超级多项式时间,但已知的上限和下限之间仍然存在较大的差距。 在这项工作中,我们本质上表征了统计查询(SQ)模型中Massart Halfspaces的有效学习性。具体来说,我们表明,在$ \ mathbb {r}^d $上学习MassArt Halfspaces的有效SQ算法也无法获得比$ω(η)$更好的错误,即使$ \ Mathrm {opt} = 2^{ - \ log log^{c}(c}(c}(c}(d)} $)此外,当噪声上限$η$接近$ 1/2 $时,我们的错误下限将变为$η-o_η(1)$,其中$o_η(1)$ term tre $ 0 $当$η$接近$ 1/2 $。我们的结果提供了有力的证据表明,Massart半空间的已知学习算法几乎是最好的,从而解决了学习理论中的长期开放问题。
We study the problem of PAC learning halfspaces with Massart noise. Given labeled samples $(x, y)$ from a distribution $D$ on $\mathbb{R}^{d} \times \{ \pm 1\}$ such that the marginal $D_x$ on the examples is arbitrary and the label $y$ of example $x$ is generated from the target halfspace corrupted by a Massart adversary with flipping probability $η(x) \leq η\leq 1/2$, the goal is to compute a hypothesis with small misclassification error. The best known $\mathrm{poly}(d, 1/ε)$-time algorithms for this problem achieve error of $η+ε$, which can be far from the optimal bound of $\mathrm{OPT}+ε$, where $\mathrm{OPT} = \mathbf{E}_{x \sim D_x} [η(x)]$. While it is known that achieving $\mathrm{OPT}+o(1)$ error requires super-polynomial time in the Statistical Query model, a large gap remains between known upper and lower bounds. In this work, we essentially characterize the efficient learnability of Massart halfspaces in the Statistical Query (SQ) model. Specifically, we show that no efficient SQ algorithm for learning Massart halfspaces on $\mathbb{R}^d$ can achieve error better than $Ω(η)$, even if $\mathrm{OPT} = 2^{-\log^{c} (d)}$, for any universal constant $c \in (0, 1)$. Furthermore, when the noise upper bound $η$ is close to $1/2$, our error lower bound becomes $η- o_η(1)$, where the $o_η(1)$ term goes to $0$ when $η$ approaches $1/2$. Our results provide strong evidence that known learning algorithms for Massart halfspaces are nearly best possible, thereby resolving a longstanding open problem in learning theory.