TCT：使用自举的神经切线内核共同学习联合学习

论文标题

TCT：使用自举的神经切线内核共同学习联合学习

TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels

论文作者

Yu, Yaodong, Wei, Alexander, Karimireddy, Sai Praneeth, Ma, Yi, Jordan, Michael I.

论文摘要

当客户具有不同的数据分布时，最新的联合学习方法的性能远比其集中式同行差得多。对于神经网络，即使集中式SGD轻松找到同时执行所有客户端的解决方案，当前联合优化方法也无法收敛到可比的解决方案。我们表明，这种性能差异很大程度上可以归因于非范围性提出的优化挑战。具体来说，我们发现网络的早期层确实学习了有用的功能，但是最后一层无法使用它们。也就是说，适用于此非凸问题的联合优化扭曲了最终层的学习。利用这一观察结果，我们提出了一个火车征征训练（TCT）程序，以避开此问题：首先，使用现成的方法学习功能（例如FedAvg）；然后，优化从网络的经验神经切线核近似获得的共透性问题。当客户具有不同的数据时，我们的技术可在FMNIST上的准确性提高高达36％，CIFAR10的准确性提高。

State-of-the-art federated learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions. For neural networks, even when centralized SGD easily finds a solution that is simultaneously performant for all clients, current federated optimization methods fail to converge to a comparable solution. We show that this performance disparity can largely be attributed to optimization challenges presented by nonconvexity. Specifically, we find that the early layers of the network do learn useful features, but the final layers fail to make use of them. That is, federated optimization applied to this non-convex problem distorts the learning of the final layers. Leveraging this observation, we propose a Train-Convexify-Train (TCT) procedure to sidestep this issue: first, learn features using off-the-shelf methods (e.g., FedAvg); then, optimize a convexified problem obtained from the network's empirical neural tangent kernel approximation. Our technique yields accuracy improvements of up to +36% on FMNIST and +37% on CIFAR10 when clients have dissimilar data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题