论文标题
OASYS:用于从非结构化文本构建知识库的域 - 不可能的自动化系统
OASYS: Domain-Agnostic Automated System for Constructing Knowledge Base from Unstructured Text
论文作者
论文摘要
近年来,创建和管理知识库已对零售产品和企业领域至关重要。我们提出了一种自动知识库构建系统,该系统从文档中挖出数据。该系统可以在培训过程中生成训练数据,而无需人工干预。因此,它是仅使用目标域文本语料库和预定义的知识库的域形不稳定训练。该系统称为OASYS,是第一个考虑韩语语言的系统。此外,我们还构建了韩国维基百科语料库与韩国dbpedia配对的新的人类宣布的基准数据集,以帮助系统评估。人类宣称的基准测试数据集的系统性能结果是有意义的,并表明仅在自动生成数据上训练的OASYS生成的知识库很有用。我们同时提供人类注销的测试数据集和自动生成的数据集。
In recent years, creating and managing knowledge bases have become crucial to the retail product and enterprise domains. We present an automatic knowledge base construction system that mines data from documents. This system can generate training data during the training process without human intervention. Therefore, it is domain-agnostic trainable using only the target domain text corpus and a pre-defined knowledge base. This system is called OASYS and is the first system built with the Korean language in mind. In addition, we also have constructed a new human-annotated benchmark dataset of the Korean Wikipedia corpus paired with a Korean DBpedia to aid system evaluation. The system performance results on human-annotated benchmark test dataset are meaningful and show that the generated knowledge base from OASYS trained on only auto-generated data is useful. We provide both a human-annotated test dataset and an auto-generated dataset.