论文标题
大规模网络中社区检测重叠的分布式算法
A Distributed Algorithm for Overlapped Community Detection in Large-Scale Networks
论文作者
论文摘要
社交网络中重叠的社区发现已成为一个重要的研究领域,随着网络的普及和复杂性的增加。大多数现有的解决方案是集中算法或并行算法,这些算法是计算密集型的 - 需要对整个网络的完全了解。但是,收集整个网络数据并不容易,因为实际网络的大小可能很大。这可能是由于隐私问题或技术障碍的结果。使用网络各个节点的计算能力,执行网络内计算解决了这两个问题。同时,节点通过消息传递与邻居进行交流并共享数据,这可能对缓解各个节点在网络中的隐私问题有很大帮助。上述所有问题都促使我们设计一种分散或分布式技术,以检测大型网络中的重叠社区。这是理想的,因为该技术没有提供单个故障点,即使许多节点失败,整个系统也可以继续运行。在本文中,我们解决了大规模网络重叠的社区检测问题。我们提出了一种名为DOCD的有效分布式算法,以识别网络中重叠的社区。 DOCD算法的效率通过有关真实网络数据的大量模拟研究得到了验证,例如Dolphin,Zachary空手道俱乐部,足球俱乐部和Facebook Ego网络。我们表明,DOCD算法能够在社区模块化和已识别社区的数量方面与现有的经典集中式算法保持渐近的结果。 DOCD算法还可以有效地识别具有少量通信和计算的重叠节点和重叠的社区。
Overlapped community detection in social networks has become an important research area with the increasing popularity and complexity of the networks. Most of the existing solutions are either centralized or parallel algorithms, which are computationally intensive - require complete knowledge of the entire networks. But it isn't easy to collect entire network data because the size of the actual networks may be prohibitively large. This may be a result of either privacy concerns or technological impediments. Performing in-network computation solves both problems utilizing the computational capability of the individual nodes of the network. Simultaneously, nodes communicate and share data with their neighbors via message passing, which may go a long way toward mitigating individual nodes' privacy concerns in the network. All the aforementioned concerns motivated us to design a decentralized or distributed technique to detect overlapped communities in a large-scale network. It is desirable because this technique does not offer a single point of failure, and the system as a whole can continue to function even when many of the nodes fail. In this paper, we address the overlapped community detection problem for large-scale networks. We present an efficient distributed algorithm, named DOCD, to identify the overlapped communities in the network. The DOCD algorithm's efficiency is verified with extensive simulation study on real network data such as Dolphin, Zachary karate club, Football club, and Facebook ego networks. We show that DOCD algorithm is capable of keeping the asymptotically same results with the existing classical centralized algorithms in terms of community modularity and the number of identified communities. The DOCD algorithm can also efficiently identify the overlapped nodes and overlapped communities with a small number of rounds of communication and computation.