论文标题
垂直联合主体组件分析及其在特征分布数据上的内核扩展
Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data
论文作者
论文摘要
尽管研究兴趣和联合学习(FL)在各个领域的巨大兴趣和快速应用,但现有的研究主要集中于在水平分区的本地数据集设置下进行监督的联邦学习。本文将在垂直分区的数据集设置下研究无监督的FL。因此,我们提出了用于垂直分区数据集(VFEDPCA)方法的联合主体组件分析,该方法降低了所有客户端上关节数据集的维度,并提取了下游数据分析的主组件特征信息。我们进一步利用了非线性维度降低,并提出了垂直联合的高级内核主成分分析(VFEDAKPCA)方法,该方法可以有效,协作对许多实际数据集中存在的非线性性质进行建模。此外,我们研究了两个通信拓扑。第一个是服务器客户拓扑,半自由的服务器协调联合培训,而第二个则是完全分离的拓扑结构,它通过允许客户自己与邻居进行交流,从而进一步消除了服务器的需求。对五种现实世界数据集进行的广泛实验证实了VFEDPCA和VFEDAKPCA在垂直分区的FL设置下的功效。代码可在以下网址找到:https://github.com/juyongjiang/vfedpca-vfedakpca
Despite enormous research interest and rapid application of federated learning (FL) to various areas, existing studies mostly focus on supervised federated learning under the horizontally partitioned local dataset setting. This paper will study the unsupervised FL under the vertically partitioned dataset setting. Accordingly, we propose the federated principal component analysis for vertically partitioned dataset (VFedPCA) method, which reduces the dimensionality across the joint datasets over all the clients and extracts the principal component feature information for downstream data analysis. We further take advantage of the nonlinear dimensionality reduction and propose the vertical federated advanced kernel principal component analysis (VFedAKPCA) method, which can effectively and collaboratively model the nonlinear nature existing in many real datasets. In addition, we study two communication topologies. The first is a server-client topology where a semi-trusted server coordinates the federated training, while the second is the fully-decentralized topology which further eliminates the requirement of the server by allowing clients themselves to communicate with their neighbors. Extensive experiments conducted on five types of real-world datasets corroborate the efficacy of VFedPCA and VFedAKPCA under the vertically partitioned FL setting. Code is available at: https://github.com/juyongjiang/VFedPCA-VFedAKPCA