论文标题
非凸优化的分布式梯度方法:本地和全局收敛保证
Distributed Gradient Methods for Nonconvex Optimization: Local and Global Convergence Guarantees
论文作者
论文摘要
本文讨论了用于在非凸优化中计算本地和全局最小值的分布式梯度散发算法。为了进行局部优化,我们专注于分布式随机梯度下降(D-SGD),这是一个基于网络的简单经典SGD变体。我们讨论了本地的最小收敛保证,并探讨了稳定的模型定理在分析鞍点避免鞍点中的简单但至关重要的作用。为了进行全局优化,我们讨论了基于退火的方法,其中将缓慢衰减的噪声添加到D-SGD中。讨论了在哪些与全球最小值的融合的条件。数值示例说明了论文中的关键概念。
The article discusses distributed gradient-descent algorithms for computing local and global minima in nonconvex optimization. For local optimization, we focus on distributed stochastic gradient descent (D-SGD)--a simple network-based variant of classical SGD. We discuss local minima convergence guarantees and explore the simple but critical role of the stable-manifold theorem in analyzing saddle-point avoidance. For global optimization, we discuss annealing-based methods in which slowly decaying noise is added to D-SGD. Conditions are discussed under which convergence to global minima is guaranteed. Numerical examples illustrate the key concepts in the paper.