论文标题

句子嵌入的结构和语义特性的比较研究

A Comparative Study on Structural and Semantic Properties of Sentence Embeddings

论文作者

Kalinowski, Alexander, An, Yuan

论文摘要

句子嵌入式编码自然语言句子为低维密度向量。使用句子嵌入来改善几个重要的自然语言处理任务已付出了很大的努力。关系提取是一种NLP任务,旨在识别非结构化文本中知识库中定义的结构化关系。一种有希望的和更有效的方法是将文本和结构化知识嵌入到低维空间中,并发现它们之间的语义一致性或映射。尽管文献中已经提出了许多用于嵌入句子和知识图的技术,但在关系提取方面,对这些嵌入空间的结构和语义特性知之甚少。在本文中,我们通过评估携带类似感官的句子的程度来研究上述属性,并将其嵌入在近距离子空间中,以及我们是否可以利用该结构以使句子与知识图对齐。我们使用广泛使用的大规模数据集提出一组实验,以进行关系提取,并专注于一组关键句子嵌入方法。我们还提供了用于在https://github.com/akalino/semantic-strentural-sentences上复制这些实验的代码。这些嵌入方法涵盖了各种技术,从简单的单词嵌入组合到基于变压器的BERT风格模型。我们的实验结果表明,不同的嵌入空间对于结构和语义特性具有不同程度的强度。这些结果为开发基于嵌入的关系提取方法提供了有用的信息。

Sentence embeddings encode natural language sentences as low-dimensional dense vectors. A great deal of effort has been put into using sentence embeddings to improve several important natural language processing tasks. Relation extraction is such an NLP task that aims at identifying structured relations defined in a knowledge base from unstructured text. A promising and more efficient approach would be to embed both the text and structured knowledge in low-dimensional spaces and discover semantic alignments or mappings between them. Although a number of techniques have been proposed in the literature for embedding both sentences and knowledge graphs, little is known about the structural and semantic properties of these embedding spaces in terms of relation extraction. In this paper, we investigate the aforementioned properties by evaluating the extent to which sentences carrying similar senses are embedded in close proximity sub-spaces, and if we can exploit that structure to align sentences to a knowledge graph. We propose a set of experiments using a widely-used large-scale data set for relation extraction and focusing on a set of key sentence embedding methods. We additionally provide the code for reproducing these experiments at https://github.com/akalino/semantic-structural-sentences. These embedding methods cover a wide variety of techniques ranging from simple word embedding combination to transformer-based BERT-style model. Our experimental results show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源