加速双重平均方法用于分散的约束优化

论文标题

加速双重平均方法用于分散的约束优化

Accelerated Dual Averaging Methods for Decentralized Constrained Optimization

论文作者

Liu, Changxin, Shi, Yang, Li, Huiping, Du, Wenli

论文摘要

在这项工作中，我们研究了网络中的分散凸的约束优化问题。我们专注于基于双平均算法框架，该算法框架有充分记录在处理约束和复杂的通信环境方面具有优越性。提出了两种新的分散双重平均（DDA）算法。在第一阶动态平均共识协议中，针对DDA型算法量身定制二阶动态平均共识协议，该算法将每个代理与比常规方案更准确地估计全局双变量。我们严格地证明，所提出的算法达到了$ \ Mathcal {o}（1/t）$ concemence for General convex和平滑问题，对此，现有的DDA方法仅在$ \ Mathcal {o}（O}（1/\ sqrt {t}）$之前，我们才能在$ \ MATHCAL {O}中收敛。在第二种中，我们使用外推技术来加速DDA的收敛性。与现有的加速算法相比，通常在代理之间交换两个不同的变量，拟议的算法仅在局部梯度上寻求共识。然后，推断是基于两个原始变量序列进行的，这些变量分别由连续两个时间瞬间的梯度积累确定。事实证明，该算法的收敛为$ \ MATHCAL {O}（1）\ left（\ frac {1} {t^2}+\ frac {1} {t（1-β）^2} \ right）$，其中$ c $表示$ c $表示$ c $ commulul of communtar commuly commutix mix commix corming mixing mixing mixing mixing mixing mixing mixing mixing torming mittrix。我们指出，保证收敛算法参数的条件不依赖于混合矩阵的光谱，从而使自己易于实践。最后，提出了数值结果以证明所提出方法的效率。

In this work, we study decentralized convex constrained optimization problems in networks. We focus on the dual averaging-based algorithmic framework that is well-documented to be superior in handling constraints and complex communication environments simultaneously. Two new decentralized dual averaging (DDA) algorithms are proposed. In the first one, a second-order dynamic average consensus protocol is tailored for DDA-type algorithms, which equips each agent with a provably more accurate estimate of the global dual variable than conventional schemes. We rigorously prove that the proposed algorithm attains $\mathcal{O}(1/t)$ convergence for general convex and smooth problems, for which existing DDA methods were only known to converge at $\mathcal{O}(1/\sqrt{t})$ prior to our work. In the second one, we use the extrapolation technique to accelerate the convergence of DDA. Compared to existing accelerated algorithms, where typically two different variables are exchanged among agents at each time, the proposed algorithm only seeks consensus on local gradients. Then, the extrapolation is performed based on two sequences of primal variables which are determined by the accumulations of gradients at two consecutive time instants, respectively. The algorithm is proved to converge at $\mathcal{O}(1)\left(\frac{1}{t^2}+\frac{1}{t(1-β)^2}\right)$, where $β$ denotes the second largest singular value of the mixing matrix. We remark that the condition for the algorithmic parameter to guarantee convergence does not rely on the spectrum of the mixing matrix, making itself easy to satisfy in practice. Finally, numerical results are presented to demonstrate the efficiency of the proposed methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题