论文标题
通过无参数元节点近似的有效块对比度学习
Efficient block contrastive learning via parameter-free meta-node approximation
论文作者
论文摘要
对比学习最近在包括图形在内的许多领域取得了巨大的成功。然而,对比损失,尤其是对于图形,需要大量的负样本,这些样本是不可计算的,并且在二次时复杂性具有计算性较高。子采样不是最佳和不正确的负抽样导致采样偏差。在这项工作中,我们提出了一种基于元节点的近似技术,该技术可以(a)在二次群集大小时复杂性中代理所有负组合(b),((c)在图级别,而不是节点级别,以及(d)利用图形稀疏性。通过用添加群集对替换节点对,我们在图级上计算群集时间的负ferives。最终的代理近似元节点对比(PAMC)损失基于简单优化的GPU操作,捕获了完整的负否,但具有线性时间复杂性,却是有效的。通过避免采样,我们有效地消除了样本偏差。我们符合大量样品的标准,从而实现了块对比度,这被证明超过了配对的损失。我们使用学习的软群集分配进行元节点收缩,并避免在边缘创建过程中添加可能的异质和噪声。从理论上讲,我们表明现实世界图很容易满足我们近似所需的条件。从经验上讲,我们在6个基准测试的最先进的图表聚类上表现出有希望的准确性提高。重要的是,我们在效率方面获得了可观的收益。训练时间最高为3倍,推理时间为1.8倍,减少GPU内存的时间超过5倍。
Contrastive learning has recently achieved remarkable success in many domains including graphs. However contrastive loss, especially for graphs, requires a large number of negative samples which is unscalable and computationally prohibitive with a quadratic time complexity. Sub-sampling is not optimal and incorrect negative sampling leads to sampling bias. In this work, we propose a meta-node based approximation technique that can (a) proxy all negative combinations (b) in quadratic cluster size time complexity, (c) at graph level, not node level, and (d) exploit graph sparsity. By replacing node-pairs with additive cluster-pairs, we compute the negatives in cluster-time at graph level. The resulting Proxy approximated meta-node Contrastive (PamC) loss, based on simple optimized GPU operations, captures the full set of negatives, yet is efficient with a linear time complexity. By avoiding sampling, we effectively eliminate sample bias. We meet the criterion for larger number of samples, thus achieving block-contrastiveness, which is proven to outperform pair-wise losses. We use learnt soft cluster assignments for the meta-node constriction, and avoid possible heterophily and noise added during edge creation. Theoretically, we show that real world graphs easily satisfy conditions necessary for our approximation. Empirically, we show promising accuracy gains over state-of-the-art graph clustering on 6 benchmarks. Importantly, we gain substantially in efficiency; up to 3x in training time, 1.8x in inference time and over 5x in GPU memory reduction.