论文标题
从句子袋到文档:通过机器阅读理解的远距离监督关系提取
From Bag of Sentences to Document: Distantly Supervised Relation Extraction via Machine Reading Comprehension
论文作者
论文摘要
远处监督(DS)是一种有前途的关系提取的方法,但通常遭受嘈杂的标签问题。传统的DS方法通常代表一个实体对作为一袋句子和Denoise标签,并使用多实体学习技术。然而,基于袋子的范式未能利用句子间级别和实体级别的证据进行关系提取,并且它们的denoing算法通常是专业的和复杂的。在本文中,我们提出了一个新的DS范式 - 基于文档的远处监督,该监督将关系提取作为基于文档的机器阅读理解(MRC)任务。通过重新组织有关实体作为文档的所有句子,并通过以特定于关系的问题查询文档来提取关系,基于文档的DS范式可以同时编码和利用所有句子级别,范围内的阶段级别,实体级别和实体级别的证据。此外,我们设计了一个新的损失功能 - dsloss(遥远的监督损失),该功能只能使用$ \ langle $文档,问答,答案,$ \ rangle $ tutples有效地训练MRC模型,因此可以固有地解决嘈杂的标签问题。实验表明,我们的方法实现了新的最先进的DS性能。
Distant supervision (DS) is a promising approach for relation extraction but often suffers from the noisy label problem. Traditional DS methods usually represent an entity pair as a bag of sentences and denoise labels using multi-instance learning techniques. The bag-based paradigm, however, fails to leverage the inter-sentence-level and the entity-level evidence for relation extraction, and their denoising algorithms are often specialized and complicated. In this paper, we propose a new DS paradigm--document-based distant supervision, which models relation extraction as a document-based machine reading comprehension (MRC) task. By re-organizing all sentences about an entity as a document and extracting relations via querying the document with relation-specific questions, the document-based DS paradigm can simultaneously encode and exploit all sentence-level, inter-sentence-level, and entity-level evidence. Furthermore, we design a new loss function--DSLoss (distant supervision loss), which can effectively train MRC models using only $\langle$document, question, answer$\rangle$ tuples, therefore noisy label problem can be inherently resolved. Experiments show that our method achieves new state-of-the-art DS performance.