Distill2Vec：动态图表用知识蒸馏学习

论文标题

Distill2Vec：动态图表用知识蒸馏学习

Distill2Vec: Dynamic Graph Representation Learning with Knowledge Distillation

论文作者

Antaris, Stefanos, Rafailidis, Dimitrios

论文摘要

动态图表示学习策略基于不同的神经体系结构，以捕获图表演变。但是，潜在的神经体系结构需要大量参数才能训练并遭受在线推理潜伏期较高的影响，这是新数据到达在线时必须更新几个模型参数。在这项研究中，我们提出了Distill2Vec，这是一种知识蒸馏策略，用于训练具有较少可训练参数的紧凑型模型，以减少在线推断的延迟并保持模型准确性高。我们根据Kullback-Leibler Divergence设计了一个蒸馏损失功能，以将获得的知识从在离线数据训练的教师模型中转移到用于在线数据的小型学生模型。我们使用公开数据集进行的实验表明，我们提出的模型优于几种最先进的方法，在链接预测任务中，相对增长高达5％。此外，我们以所需参数的数量来证明我们的知识蒸馏策略的有效性，与基线方法相比，Distill2Vec达到了高达7：100的压缩比。出于复制目的，我们的实施可在https://stefanosantaris.github.io/distill2vec上公开获得。

Dynamic graph representation learning strategies are based on different neural architectures to capture the graph evolution over time. However, the underlying neural architectures require a large amount of parameters to train and suffer from high online inference latency, that is several model parameters have to be updated when new data arrive online. In this study we propose Distill2Vec, a knowledge distillation strategy to train a compact model with a low number of trainable parameters, so as to reduce the latency of online inference and maintain the model accuracy high. We design a distillation loss function based on Kullback-Leibler divergence to transfer the acquired knowledge from a teacher model trained on offline data, to a small-size student model for online data. Our experiments with publicly available datasets show the superiority of our proposed model over several state-of-the-art approaches with relative gains up to 5% in the link prediction task. In addition, we demonstrate the effectiveness of our knowledge distillation strategy, in terms of number of required parameters, where Distill2Vec achieves a compression ratio up to 7:100 when compared with baseline approaches. For reproduction purposes, our implementation is publicly available at https://stefanosantaris.github.io/Distill2Vec.

下载PDF全文

下载文献需遵守相关版权规定

论文标题