论文标题
梯度下降何时与逻辑损失找到插值两层网络?
When does gradient descent with logistic loss find interpolating two-layer networks?
论文作者
论文摘要
我们研究了使用Logistic损失的有限宽度两层平滑的Relu网络进行二进制分类的培训。我们表明,如果初始损失足够小,则梯度下降将训练损失提高到零。当数据满足某些群集和分离条件并且网络足够宽时,我们表明梯度下降的一个步骤可充分降低损失,从而使第一个结果适用。
We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss. We show that gradient descent drives the training loss to zero if the initial loss is small enough. When the data satisfies certain cluster and separation conditions and the network is wide enough, we show that one step of gradient descent reduces the loss sufficiently that the first result applies.