图形Infoclust：利用群集级节点信息，以获取无监督的图形表示

论文标题

图形Infoclust：利用群集级节点信息，以获取无监督的图形表示

Graph InfoClust: Leveraging cluster-level node information for unsupervised graph representation learning

论文作者

Mavromatis, Costas, Karypis, George

论文摘要

当不可用的外部监督不可用时，无监督（或自我监督）的图表表示学习对于促进各种图形数据挖掘任务至关重要。面临的挑战是编码有关图形结构的信息以及与节点和边缘关联的属性，分为低维空间。大多数现有的无监督方法促进了拓扑结束的节点的类似表示。最近，结果表明，利用其他图形信息，例如在所有节点之间共享的信息，鼓励表示形式注意图表的全球属性，从而极大地提高了它们的质量。但是，在大多数图中，可以捕获的结构明显更多，例如，节点倾向于属于（多个）表示结构相似的节点的（多个）簇。在此观察过程中，我们提出了一种称为Graph Infoclust（GIC）的图表表示方法，该方法旨在另外捕获群集级信息内容。这些簇是通过可区分的K-均值方法计算的，并通过最大化同一簇的节点之间的相互信息共同优化。这种优化导致节点表示捕获更丰富的信息和淋巴结相互作用，从而提高了它们的质量。实验表明，GIC在各种下游任务（节点分类，链接预测和节点聚类）中的最先进方法比最佳竞争方法平均增长了0.9％至6.1％。

Unsupervised (or self-supervised) graph representation learning is essential to facilitate various graph data mining tasks when external supervision is unavailable. The challenge is to encode the information about the graph structure and the attributes associated with the nodes and edges into a low dimensional space. Most existing unsupervised methods promote similar representations across nodes that are topologically close. Recently, it was shown that leveraging additional graph-level information, e.g., information that is shared among all nodes, encourages the representations to be mindful of the global properties of the graph, which greatly improves their quality. However, in most graphs, there is significantly more structure that can be captured, e.g., nodes tend to belong to (multiple) clusters that represent structurally similar nodes. Motivated by this observation, we propose a graph representation learning method called Graph InfoClust (GIC), that seeks to additionally capture cluster-level information content. These clusters are computed by a differentiable K-means method and are jointly optimized by maximizing the mutual information between nodes of the same clusters. This optimization leads the node representations to capture richer information and nodal interactions, which improves their quality. Experiments show that GIC outperforms state-of-art methods in various downstream tasks (node classification, link prediction, and node clustering) with a 0.9% to 6.1% gain over the best competing approach, on average.

下载PDF全文

下载文献需遵守相关版权规定

论文标题