论文标题
生成性超晶模型和光谱嵌入
Generative Hypergraph Models and Spectral Embedding
论文作者
论文摘要
许多复杂的系统涉及两个以上代理之间的相互作用。 HyperGraphs通过超蛋白质捕获了这些高阶相互作用,这些杂音可能会连接两个以上的节点。我们考虑将超颗粒嵌入低维欧几里得空间中的问题,以使大多数相互作用都是短距离的。该嵌入与许多后续任务有关,例如节点重新排序,聚类和可视化。我们专注于分别恢复线性和周期性结构的两个光谱嵌入算法。在周期性的情况下,节点位于单位圆上。我们表明,两种光谱超图嵌入算法与新的生成超图模型相关联。这些模型会根据嵌入式空间中的节点位置生成超系统,并鼓励短距离连接。它们使我们能够通过最大可能性量化数据中周期性和线性结构的相对存在。它们还提高了节点嵌入的解释性,并为超边缘预测提供了指标。我们在合成和现实世界中的超图上演示了HyperGraph嵌入和后续任务 - 包括结构定量,聚类和超轨预测。我们发现,HyperGraph方法可以超越仅使用二元边缘的聚类算法。我们还比较了在高中接触数据上的几种三元边缘预测方法,在该数据中,当培训数据的量受到限制时,我们的算法会改善基准方法。
Many complex systems involve interactions between more than two agents. Hypergraphs capture these higher-order interactions through hyperedges that may link more than two nodes. We consider the problem of embedding a hypergraph into low-dimensional Euclidean space so that most interactions are short-range. This embedding is relevant to many follow-on tasks, such as node reordering, clustering, and visualization. We focus on two spectral embedding algorithms customized to hypergraphs which recover linear and periodic structures respectively. In the periodic case, nodes are positioned on the unit circle. We show that the two spectral hypergraph embedding algorithms are associated with a new class of generative hypergraph models. These models generate hyperedges according to node positions in the embedded space and encourage short-range connections. They allow us to quantify the relative presence of periodic and linear structures in the data through maximum likelihood. They also improve the interpretability of node embedding and provide a metric for hyperedge prediction. We demonstrate the hypergraph embedding and follow-on tasks -- including structure quantification, clustering and hyperedge prediction -- on synthetic and real-world hypergraphs. We find that the hypergraph approach can outperform clustering algorithms that use only dyadic edges. We also compare several triadic edge prediction methods on high school contact data where our algorithm improves upon benchmark methods when the amount of training data is limited.