好D：在无监督的图表上分布式检测

论文标题

好D：在无监督的图表上分布式检测

GOOD-D: On Unsupervised Graph Out-Of-Distribution Detection

论文作者

Liu, Yixin, Ding, Kaize, Liu, Huan, Pan, Shirui

论文摘要

大多数现有的深度学习模型都是根据封闭世界的假设培训的，其中假定测试数据被绘制为I.I.D.从与培训数据相同的分布，称为分布（ID）。但是，当在开放世界的情况下部署模型时，测试样本可能是分发的（OOD），因此应谨慎处理。为了检测从未知分布中得出的OOD样品，最近的OOD检测受到了越来越多的关注。但是，当前的努力主要集中在网格结构的数据上，其应用于图形结构的数据的应用仍未得到探索。考虑到图表上的数据标记通常是耗时且劳动密集型的事实，在这项工作中，我们研究了无监督图OOD检测的问题，旨在仅基于未标记的ID数据来检测OOD图。为了实现这一目标，我们开发了一个新的图形对比学习框架GOODD，用于检测OOD图，而无需使用任何地面真相标签。通过在通过我们的无扰动图数据增强方法生成的增强图上执行层次对比度学习，Good-D能够根据不同的粒度（即节点级别，图形级别和组级别）的语义不一致来捕获潜在的ID模式并准确检测OOD图。作为无监督的图形OOD检测的开创性工作，我们建立了一个全面的基准，以将我们提出的方法与不同的最新方法进行比较。实验结果证明了我们的方法优于各种数据集上的不同方法。

Most existing deep learning models are trained based on the closed-world assumption, where the test data is assumed to be drawn i.i.d. from the same distribution as the training data, known as in-distribution (ID). However, when models are deployed in an open-world scenario, test samples can be out-of-distribution (OOD) and therefore should be handled with caution. To detect such OOD samples drawn from unknown distribution, OOD detection has received increasing attention lately. However, current endeavors mostly focus on grid-structured data and its application for graph-structured data remains under-explored. Considering the fact that data labeling on graphs is commonly time-expensive and labor-intensive, in this work we study the problem of unsupervised graph OOD detection, aiming at detecting OOD graphs solely based on unlabeled ID data. To achieve this goal, we develop a new graph contrastive learning framework GOOD-D for detecting OOD graphs without using any ground-truth labels. By performing hierarchical contrastive learning on the augmented graphs generated by our perturbation-free graph data augmentation method, GOOD-D is able to capture the latent ID patterns and accurately detect OOD graphs based on the semantic inconsistency in different granularities (i.e., node-level, graph-level, and group-level). As a pioneering work in unsupervised graph-level OOD detection, we build a comprehensive benchmark to compare our proposed approach with different state-of-the-art methods. The experiment results demonstrate the superiority of our approach over different methods on various datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题