了解模型不一致对随机改组的增量SGD收敛性的影响

论文标题

了解模型不一致对随机改组的增量SGD收敛性的影响

Understanding the Impact of Model Incoherence on Convergence of Incremental SGD with Random Reshuffle

论文作者

Ma, Shaocong, Zhou, Yi

论文摘要

尽管具有随机改组的SGD已在机器学习应用中广泛使用，但对模型特征如何影响算法的收敛性的理解有限。在这项工作中，我们介绍了模型不一致，以表征模型特征的多样性，并研究其对SGD收敛的影响，并在弱强凸度下随机改组。具体而言，最小化的不一致衡量样本损失的全局最小化器与总损失的差异之间的差异，并影响SGD的收敛误差与随机改组。特别是，我们表明，由随机改组的SGD生成的可变序列会收敛到全部最小化相干性下的总损失的某个全局最小化器。其他曲率不一致测量样本损失的条件数量质量，并确定SGD的收敛速率。由于模型不一致，我们的结果表明，在随机重新安装下，SGD的收敛速率比随机抽样的收敛率更快，并且收敛误差较小，因此为SGD的出色实践表现提供了正当性，以随机的改组。

Although SGD with random reshuffle has been widely-used in machine learning applications, there is a limited understanding of how model characteristics affect the convergence of the algorithm. In this work, we introduce model incoherence to characterize the diversity of model characteristics and study its impact on convergence of SGD with random reshuffle under weak strong convexity. Specifically, minimizer incoherence measures the discrepancy between the global minimizers of a sample loss and those of the total loss and affects the convergence error of SGD with random reshuffle. In particular, we show that the variable sequence generated by SGD with random reshuffle converges to a certain global minimizer of the total loss under full minimizer coherence. The other curvature incoherence measures the quality of condition numbers of the sample losses and determines the convergence rate of SGD. With model incoherence, our results show that SGD has a faster convergence rate and smaller convergence error under random reshuffle than those under random sampling, and hence provide justifications to the superior practical performance of SGD with random reshuffle.

下载PDF全文

下载文献需遵守相关版权规定

论文标题