论文标题

分布式学习的概括误差的率延伸理论界限

Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning

论文作者

Sefidgaran, Milad, Chor, Romain, Zaidi, Abdellatif

论文摘要

在本文中,我们使用速率理论的工具来建立有关统计分布式学习算法的概括误差的新上限。具体来说,有$ k $客户端的单独选择的型号由中央服务器汇总。界限取决于每个客户端算法的可压缩性,同时使其他客户的算法不压缩,并利用了每个本地模型的小变化将汇总模型的小变化更改为$ 1/k $。采用Sefidgaran等人最近提出的方法,并将其适当地扩展到分布式设置,这使得较小的速率术语可将其转化为更紧密的概括范围。然后将边界应用于分布式支持向量机(SVM),这表明分布式设置的概括误差衰减速度比集中式设置的损失快,$ \ MATHCAL {O}(\ log(k)/\ sqrt {k} {k})$。这一发现也得到了实验验证。对于多轮联邦学习设置,也获得了类似的结论,其中每个客户使用随机梯度Langevin Dynamics(SGLD)。

In this paper, we use tools from rate-distortion theory to establish new upper bounds on the generalization error of statistical distributed learning algorithms. Specifically, there are $K$ clients whose individually chosen models are aggregated by a central server. The bounds depend on the compressibility of each client's algorithm while keeping other clients' algorithms un-compressed, and leverage the fact that small changes in each local model change the aggregated model by a factor of only $1/K$. Adopting a recently proposed approach by Sefidgaran et al., and extending it suitably to the distributed setting, this enables smaller rate-distortion terms which are shown to translate into tighter generalization bounds. The bounds are then applied to the distributed support vector machines (SVM), suggesting that the generalization error of the distributed setting decays faster than that of the centralized one with a factor of $\mathcal{O}(\log(K)/\sqrt{K})$. This finding is validated also experimentally. A similar conclusion is obtained for a multiple-round federated learning setup where each client uses stochastic gradient Langevin dynamics (SGLD).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源