论文标题

大型Minibatch SGD的对比度正则化

Contrastive Weight Regularization for Large Minibatch SGD

论文作者

Yuan, Qiwei, Hua, Weizhe, Zhou, Yi, Yu, Cunxi

论文摘要

Minibatch随机梯度下降法(SGD)由于其效率和可扩展性而广泛应用于深度学习,从而使训练深层网络具有大量数据。特别是在分布式设置中,SGD通常以较大的批量尺寸应用。但是,与小批量SGD相比,接受大批次SGD训练的神经网络模型几乎无法概括地概括,即验证精度较低。在这项工作中,我们介绍了一种新颖的正则化技术,即独特的正则化(DREG),该技术复制了深网的一定层,并鼓励两层的参数多样化。 DREG技术引入了很少的计算开销。此外,我们从经验上表明,使用大型SGD使用DREG优化神经网络可以显着提高收敛性和改善的概括性能。我们还证明了DREG可以通过动量增强大批量SGD的收敛性。我们认为,DREG可以用作简单的正规化技巧,以加速深度学习中的大批量培训。

The minibatch stochastic gradient descent method (SGD) is widely applied in deep learning due to its efficiency and scalability that enable training deep networks with a large volume of data. Particularly in the distributed setting, SGD is usually applied with large batch size. However, as opposed to small-batch SGD, neural network models trained with large-batch SGD can hardly generalize well, i.e., the validation accuracy is low. In this work, we introduce a novel regularization technique, namely distinctive regularization (DReg), which replicates a certain layer of the deep network and encourages the parameters of both layers to be diverse. The DReg technique introduces very little computation overhead. Moreover, we empirically show that optimizing the neural network with DReg using large-batch SGD achieves a significant boost in the convergence and improved generalization performance. We also demonstrate that DReg can boost the convergence of large-batch SGD with momentum. We believe that DReg can be used as a simple regularization trick to accelerate large-batch training in deep learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源