论文标题
混合成员图通过系统边缘查询聚类
Mixed Membership Graph Clustering via Systematic Edge Query
论文作者
论文摘要
这项工作考虑了一个不完整的图形的聚类节点。在问题设置下,只能对边缘进行少量查询,但是整个图形不可观察。这个问题使用有限的注释,在受限的调查资源下的社区检测以及隐藏/删除的节点交互中的图形拓扑推断中找到了大规模数据聚类的应用程序。先前的工作从各个角度解决了此问题,例如,基于凸面编程的低率矩阵完成和主动查询的集合发现。尽管如此,许多现有的方法旨在估计节点的单群集成员资格,但是节点在实践中通常可能混合(即多群集)会员资格。一些查询和计算范式,例如,在凸方法中提倡的随机查询模式和基于核规范的优化,可能会引起可扩展性和实施挑战。这项工作旨在使用查询边缘学习节点的混合会员资格。所提出的方法与系统设计人员可以控制和调整的系统查询原理一起开发,以适应实施挑战 - 例如,避免查询很难获得的查询边缘。我们的框架还具有轻巧且可扩展的算法,并提供会员学习保证。关于拥挤和社区检测的Real-DATA实验用于展示我们方法的有效性。
This work considers clustering nodes of a largely incomplete graph. Under the problem setting, only a small amount of queries about the edges can be made, but the entire graph is not observable. This problem finds applications in large-scale data clustering using limited annotations, community detection under restricted survey resources, and graph topology inference under hidden/removed node interactions. Prior works tackled this problem from various perspectives, e.g., convex programming-based low-rank matrix completion and active query-based clique finding. Nonetheless, many existing methods are designed for estimating the single-cluster membership of the nodes, but nodes may often have mixed (i.e., multi-cluster) membership in practice. Some query and computational paradigms, e.g., the random query patterns and nuclear norm-based optimization advocated in the convex approaches, may give rise to scalability and implementation challenges. This work aims at learning mixed membership of nodes using queried edges. The proposed method is developed together with a systematic query principle that can be controlled and adjusted by the system designers to accommodate implementation challenges -- e.g., to avoid querying edges that are physically hard to acquire. Our framework also features a lightweight and scalable algorithm with membership learning guarantees. Real-data experiments on crowdclustering and community detection are used to showcase the effectiveness of our method.