论文标题
在节点分类任务中图形神经网络公平比较的管道
A pipeline for fair comparison of graph neural networks in node classification tasks
论文作者
论文摘要
已经研究了图形神经网络(GNN),以了解采用图形数据的多个领域的潜在适用性。但是,没有标准的培训设置可以确保新方法之间的公平比较,包括不同的模型架构和数据增强技术。我们引入了标准,可再现的基准,可以将相同的培训设置用于节点分类。对于此基准,我们构建了9个数据集,包括来自不同字段的中小型数据集和7种不同的模型。我们为小数据集设计了K折模型评估策略,并为所有数据集设计了一组标准的模型培训程序,从而为GNN提供了标准的实验管道,以帮助确保公平的模型体系结构比较。我们使用Node2Vec和Laplacian特征值来执行数据增强,以研究输入特征如何影响模型的性能。我们发现拓扑信息对于节点分类任务很重要。增加模型层的数量并不能提高性能,除了模式和集群数据集(未连接图形)。数据增强非常有用,尤其是在基线中使用Node2VEC,从而实现了实质性的基线性能。
Graph neural networks (GNNs) have been investigated for potential applicability in multiple fields that employ graph data. However, there are no standard training settings to ensure fair comparisons among new methods, including different model architectures and data augmentation techniques. We introduce a standard, reproducible benchmark to which the same training settings can be applied for node classification. For this benchmark, we constructed 9 datasets, including both small- and medium-scale datasets from different fields, and 7 different models. We design a k-fold model assessment strategy for small datasets and a standard set of model training procedures for all datasets, enabling a standard experimental pipeline for GNNs to help ensure fair model architecture comparisons. We use node2vec and Laplacian eigenvectors to perform data augmentation to investigate how input features affect the performance of the models. We find topological information is important for node classification tasks. Increasing the number of model layers does not improve the performance except on the PATTERN and CLUSTER datasets, in which the graphs are not connected. Data augmentation is highly useful, especially using node2vec in the baseline, resulting in a substantial baseline performance improvement.