正方形的铰链：快速学习率和早期停止

论文标题

正方形的铰链：快速学习率和早期停止

Fully-Corrective Gradient Boosting with Squared Hinge: Fast Learning Rates and Early Stopping

论文作者

Zeng, Jinshan, Zhang, Min, Lin, Shao-Bo

论文摘要

提升是提高机器学习中弱学习者准确性的众所周知方法。但是，文献中缺少其理论概括保证。在本文中，我们提出了一种有效的增强方法，并通过理论泛化保证了二进制分类。提出的增强方法的三个关键成分是：a）\ textIt {完全校正的贪婪}（FCG）在增强过程中的更新，b）一个可区分的\ textit {squred Hinge}（也称为\ textIt {textit {textit {truncated quadratic}）作为损失功能和c and cons and for formips for formips and consme for formipl formipl formipl（优化。使用的平方铰链损耗不仅继承了与异常值分类的众所周知的铰链损失的鲁棒性，而且还为计算实施和理论上的理由带来了一些好处。 Under some sparseness assumption, we derive a fast learning rate of the order ${\cal O}((m/\log m)^{-1/4})$ for the proposed boosting method, which can be further improved to ${\cal O}((m/\log m)^{-1/2})$ if certain additional noise assumption is imposed, where $m$ is the size of sample set.两种衍生的学习率都是用于分类的增强型方法的现有概括结果中最好的学习率。此外，为提出的方法提供了有效的早期停止方案。进行了一系列的玩具模拟和实际数据实验，以验证开发的理论并证明该方法的有效性。

Boosting is a well-known method for improving the accuracy of weak learners in machine learning. However, its theoretical generalization guarantee is missing in literature. In this paper, we propose an efficient boosting method with theoretical generalization guarantees for binary classification. Three key ingredients of the proposed boosting method are: a) the \textit{fully-corrective greedy} (FCG) update in the boosting procedure, b) a differentiable \textit{squared hinge} (also called \textit{truncated quadratic}) function as the loss function, and c) an efficient alternating direction method of multipliers (ADMM) algorithm for the associated FCG optimization. The used squared hinge loss not only inherits the robustness of the well-known hinge loss for classification with outliers, but also brings some benefits for computational implementation and theoretical justification. Under some sparseness assumption, we derive a fast learning rate of the order ${\cal O}((m/\log m)^{-1/4})$ for the proposed boosting method, which can be further improved to ${\cal O}((m/\log m)^{-1/2})$ if certain additional noise assumption is imposed, where $m$ is the size of sample set. Both derived learning rates are the best ones among the existing generalization results of boosting-type methods for classification. Moreover, an efficient early stopping scheme is provided for the proposed method. A series of toy simulations and real data experiments are conducted to verify the developed theories and demonstrate the effectiveness of the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题