a（dp）$^2 $ sgd：异步分散平行的随机梯度下降带有差异隐私

论文标题

a（dp）$^2 $ sgd：异步分散平行的随机梯度下降带有差异隐私

A(DP)$^2$SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

论文作者

Xu, Jie, Zhang, Wei, Wang, Fei

论文摘要

由于深度学习模型通常是庞大且复杂的，因此分布式学习对于提高培训效率至关重要。此外，在许多现实世界中的应用程序方案（例如医疗保健）中，分布式学习也可以保持数据本地并保护隐私。流行的分布式学习策略是联合学习，其中有一个中央服务器存储全局模型，并且使用其相应数据更新模型参数的本地计算节点。更新的模型参数将被处理并传输到中央服务器，这导致了巨大的通信成本。最近，已经提出了异步分散的分布式学习，并证明是一个没有中央服务器的更有效，更实用的策略，因此每个计算节点仅与邻居通信。尽管不会在不同的本地节点上传输原始数据，但在沟通过程中，恶意参与者发动攻击的沟通过程中仍然存在信息泄漏的风险。在本文中，我们提供了异步分散的平行SGD（ADPSGD）框架的差异私有版本，或Short的（DP）$^2 $ SGD，该框架可维持ADPSGD的通信效率，并防止推理来自恶意参与者。具体而言，r {é} nyi差异隐私用于为我们的复合高斯机制提供更严格的隐私分析，而收敛率与非私人版本一致。理论分析显示（DP）$^2 $ SGD还以最佳$ \ MATHCAL {O}（1/\ sqrt {t}）$ rate作为SGD收敛。从经验上讲，A（DP）$^2 $ SGD可以将可比较的模型精度作为同步SGD（SSGD）的差异私有版本，但运行速度比异质计算环境中的SSGD快得多。

As deep learning models are usually massive and complex, distributed learning is essential for increasing training efficiency. Moreover, in many real-world application scenarios like healthcare, distributed learning can also keep the data local and protect privacy. A popular distributed learning strategy is federated learning, where there is a central server storing the global model and a set of local computing nodes updating the model parameters with their corresponding data. The updated model parameters will be processed and transmitted to the central server, which leads to heavy communication costs. Recently, asynchronous decentralized distributed learning has been proposed and demonstrated to be a more efficient and practical strategy where there is no central server, so that each computing node only communicates with its neighbors. Although no raw data will be transmitted across different local nodes, there is still a risk of information leak during the communication process for malicious participants to make attacks. In this paper, we present a differentially private version of asynchronous decentralized parallel SGD (ADPSGD) framework, or A(DP)$^2$SGD for short, which maintains communication efficiency of ADPSGD and prevents the inference from malicious participants. Specifically, R{é}nyi differential privacy is used to provide tighter privacy analysis for our composite Gaussian mechanisms while the convergence rate is consistent with the non-private version. Theoretical analysis shows A(DP)$^2$SGD also converges at the optimal $\mathcal{O}(1/\sqrt{T})$ rate as SGD. Empirically, A(DP)$^2$SGD achieves comparable model accuracy as the differentially private version of Synchronous SGD (SSGD) but runs much faster than SSGD in heterogeneous computing environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题