论文标题

知识图嵌入的有效,简单且自动化的负抽样

Efficient, Simple and Automated Negative Sampling for Knowledge Graph Embedding

论文作者

Zhang, Yongqi, Yao, Quanming, Chen, Lei

论文摘要

负抽样在知识图(kg)中从非观察到的负单元进行采样,这是kg嵌入的重要步骤。最近,已经以负抽样引入了生成对抗网络(GAN)。通过对具有较大梯度的负三重态采样,这些方法避免了消失梯度的问题,从而获得更好的性能。但是,它们使原始模型更加复杂,更难训练。在本文中,观察到具有较大梯度的负三胞胎很重要但很少见,我们建议通过缓存直接跟踪它们。这样,我们的方法是以前基于GAN的方法的“蒸馏”版本,该方法不会浪费其他参数训练时间以适合负三胞胎的完整分布。但是,如何从缓存中进行采样和更新是两个关键问题。我们建议通过自动机器学习技术解决这些问题。自动化版本还涵盖了基于GAN的方法作为特殊情况。还提供了NScaching的理论解释,证明了优于固定采样方案的上级合理性。此外,我们进一步使用Skip-gram模型扩展了nscaching,以进行图形嵌入。最后,广泛的实验表明,我们的方法可以在各种KG嵌入模型和Skip-gram模型上获得显着改进,并且表现优于最新的负面采样方法。

Negative sampling, which samples negative triplets from non-observed ones in knowledge graph (KG), is an essential step in KG embedding. Recently, generative adversarial network (GAN), has been introduced in negative sampling. By sampling negative triplets with large gradients, these methods avoid the problem of vanishing gradient and thus obtain better performance. However, they make the original model more complex and harder to train. In this paper, motivated by the observation that negative triplets with large gradients are important but rare, we propose to directly keep track of them with the cache. In this way, our method acts as a "distilled" version of previous GAN-based methods, which does not waste training time on additional parameters to fit the full distribution of negative triplets. However, how to sample from and update the cache are two critical questions. We propose to solve these issues by automated machine learning techniques. The automated version also covers GAN-based methods as special cases. Theoretical explanation of NSCaching is also provided, justifying the superior over fixed sampling scheme. Besides, we further extend NSCaching with skip-gram model for graph embedding. Finally, extensive experiments show that our method can gain significant improvements on various KG embedding models and the skip-gram model, and outperforms the state-of-the-art negative sampling methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源