大型Minibatch SGD的对比度正则化

论文标题

大型Minibatch SGD的对比度正则化

Contrastive Weight Regularization for Large Minibatch SGD

论文作者

Yuan, Qiwei, Hua, Weizhe, Zhou, Yi, Yu, Cunxi

论文摘要

Minibatch随机梯度下降法（SGD）由于其效率和可扩展性而广泛应用于深度学习，从而使训练深层网络具有大量数据。特别是在分布式设置中，SGD通常以较大的批量尺寸应用。但是，与小批量SGD相比，接受大批次SGD训练的神经网络模型几乎无法概括地概括，即验证精度较低。在这项工作中，我们介绍了一种新颖的正则化技术，即独特的正则化（DREG），该技术复制了深网的一定层，并鼓励两层的参数多样化。 DREG技术引入了很少的计算开销。此外，我们从经验上表明，使用大型SGD使用DREG优化神经网络可以显着提高收敛性和改善的概括性能。我们还证明了DREG可以通过动量增强大批量SGD的收敛性。我们认为，DREG可以用作简单的正规化技巧，以加速深度学习中的大批量培训。

The minibatch stochastic gradient descent method (SGD) is widely applied in deep learning due to its efficiency and scalability that enable training deep networks with a large volume of data. Particularly in the distributed setting, SGD is usually applied with large batch size. However, as opposed to small-batch SGD, neural network models trained with large-batch SGD can hardly generalize well, i.e., the validation accuracy is low. In this work, we introduce a novel regularization technique, namely distinctive regularization (DReg), which replicates a certain layer of the deep network and encourages the parameters of both layers to be diverse. The DReg technique introduces very little computation overhead. Moreover, we empirically show that optimizing the neural network with DReg using large-batch SGD achieves a significant boost in the convergence and improved generalization performance. We also demonstrate that DReg can boost the convergence of large-batch SGD with momentum. We believe that DReg can be used as a simple regularization trick to accelerate large-batch training in deep learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题