联邦学习中的探索和剥削，以中毒数据排除客户

论文标题

联邦学习中的探索和剥削，以中毒数据排除客户

Exploration and Exploitation in Federated Learning to Exclude Clients with Poisoned Data

论文作者

Tabatabai, Shadha, Mohammed, Ihab, Qolomany, Basheer, Albasser, Abdullatif, Ahmad, Kashif, Abdallah, Mohamed, Al-Fuqaha, Ala

论文摘要

联合学习（FL）是热门研究主题之一，它以分布式方式利用机器学习（ML），而无需直接访问客户端的私人数据。但是，FL面临许多挑战，包括难以获得高准确性，客户与服务器之间的高沟通成本以及与对抗ML有关的安全攻击。为了应对这三个挑战，我们提出了一种灵感来自进化技术的FL算法。所提出的算法在许多群集中随机分组客户端，每个算法都随机选择一个模型以探索不同模型的性能。然后在重复过程中训练群集，在该过程中，在每个迭代中删除了最差的性能群集，直到保留一个群集为止。在每次迭代中，由于使用中毒的数据或低性能，某些客户被从集群中驱逐出境。在下一次迭代中，尚存的客户被利用。然后，将其剩余的与尚存客户的集群用于训练最佳的FL模型（即剩余的FL模型）。沟通成本降低了，因为在FL模型的最终培训中使用了较少的客户。为了评估所提出的算法的性能，我们使用女性数据集进行了许多实验，并将结果与随机FL算法进行比较。实验结果表明，所提出的算法在准确性，通信成本和安全性方面优于基线算法。

Federated Learning (FL) is one of the hot research topics, and it utilizes Machine Learning (ML) in a distributed manner without directly accessing private data on clients. However, FL faces many challenges, including the difficulty to obtain high accuracy, high communication cost between clients and the server, and security attacks related to adversarial ML. To tackle these three challenges, we propose an FL algorithm inspired by evolutionary techniques. The proposed algorithm groups clients randomly in many clusters, each with a model selected randomly to explore the performance of different models. The clusters are then trained in a repetitive process where the worst performing cluster is removed in each iteration until one cluster remains. In each iteration, some clients are expelled from clusters either due to using poisoned data or low performance. The surviving clients are exploited in the next iteration. The remaining cluster with surviving clients is then used for training the best FL model (i.e., remaining FL model). Communication cost is reduced since fewer clients are used in the final training of the FL model. To evaluate the performance of the proposed algorithm, we conduct a number of experiments using FEMNIST dataset and compare the result against the random FL algorithm. The experimental results show that the proposed algorithm outperforms the baseline algorithm in terms of accuracy, communication cost, and security.

下载PDF全文

下载文献需遵守相关版权规定

论文标题