差异化垂直联合学习

论文标题

差异化垂直联合学习

Differentially Private Vertical Federated Learning

论文作者

Ranbaduge, Thilina, Ding, Ming

论文摘要

成功的机器学习（ML）算法通常依赖大量的高质量数据来训练良好的模型。有监督的学习方法（例如深度学习技术）为现实生活中的应用产生高质量的ML功能，但是成本和人为为培训数据标记的巨大努力。联合学习（FL）的最新进展允许多个数据所有者或组织在不共享原始数据的情况下协作训练机器学习模型。从这个角度来看，垂直FL使组织可以在参与组织垂直分区数据时构建全球模型。此外，与直接共享数据相比，参与组织通常需要更少的资源，从而实现轻巧且可扩展的分布式培训解决方案。但是，由于中间输出的通信和模型更新的梯度，垂直FL中的隐私保护是具有挑战性的。这邀请对手实体推断数据的其他组织。因此，在本文中，我们旨在探讨如何在差异隐私（DP）设置中保护单个组织数据的隐私。我们使用不同的现实数据集和DP预算进行实验。我们的实验结果表明，在扰动噪声的量方面，需要找到一个权衡点才能在垂直FL性能和隐私保护之间取得平衡。

A successful machine learning (ML) algorithm often relies on a large amount of high-quality data to train well-performed models. Supervised learning approaches, such as deep learning techniques, generate high-quality ML functions for real-life applications, however with large costs and human efforts to label training data. Recent advancements in federated learning (FL) allow multiple data owners or organisations to collaboratively train a machine learning model without sharing raw data. In this light, vertical FL allows organisations to build a global model when the participating organisations have vertically partitioned data. Further, in the vertical FL setting the participating organisation generally requires fewer resources compared to sharing data directly, enabling lightweight and scalable distributed training solutions. However, privacy protection in vertical FL is challenging due to the communication of intermediate outputs and the gradients of model update. This invites adversary entities to infer other organisations underlying data. Thus, in this paper, we aim to explore how to protect the privacy of individual organisation data in a differential privacy (DP) setting. We run experiments with different real-world datasets and DP budgets. Our experimental results show that a trade-off point needs to be found to achieve a balance between the vertical FL performance and privacy protection in terms of the amount of perturbation noise.

下载PDF全文

下载文献需遵守相关版权规定

论文标题