Perlex：双语波斯语 - 英语金数据集用于关系提取

论文标题

Perlex：双语波斯语 - 英语金数据集用于关系提取

PERLEX: A Bilingual Persian-English Gold Dataset for Relation Extraction

论文作者

Asgari-Bidhendi, Majid, Nasser, Mehrdad, Janfada, Behrooz, Minaei-Bidgoli, Behrouz

论文摘要

关系提取是句子中实体之间提取语义关系的任务。它是一些自然语言处理任务的重要组成部分，例如信息提取，知识提取和知识库人群。这项研究的主要动机是由于缺乏波斯语中的关系提取的数据集以及从波斯语言中为不同应用中提取知识的必要性。在本文中，我们将“ Perlex”作为第一个用于关系提取的波斯数据集，这是“ Semeval-2010-Task-8”数据集的专家翻译版本。此外，本文利用最先进的语言敏锐算法解决了波斯关系提取。我们在提出的双语数据集中采用六个不同的模型来提取关系提取，包括非神经模型（基线），三个神经模型以及两个由多语言 - 伯特上下文表达表示的深度学习模型。实验导致最高的F-评分为77.66％（由Bertem-MTB方法提供）作为波斯语中关系提取的最新提取。

Relation extraction is the task of extracting semantic relations between entities in a sentence. It is an essential part of some natural language processing tasks such as information extraction, knowledge extraction, and knowledge base population. The main motivations of this research stem from a lack of a dataset for relation extraction in the Persian language as well as the necessity of extracting knowledge from the growing big-data in the Persian language for different applications. In this paper, we present "PERLEX" as the first Persian dataset for relation extraction, which is an expert-translated version of the "Semeval-2010-Task-8" dataset. Moreover, this paper addresses Persian relation extraction utilizing state-of-the-art language-agnostic algorithms. We employ six different models for relation extraction on the proposed bilingual dataset, including a non-neural model (as the baseline), three neural models, and two deep learning models fed by multilingual-BERT contextual word representations. The experiments result in the maximum f-score 77.66% (provided by BERTEM-MTB method) as the state-of-the-art of relation extraction in the Persian language.

下载PDF全文

下载文献需遵守相关版权规定

论文标题