论文标题
MOKB6:多语言开放知识基础完成基准
mOKB6: A Multilingual Open Knowledge Base Completion Benchmark
论文作者
论文摘要
通过开放信息提取(开放IE)系统获得的形式(主题短语,关系短语,对象短语)构建的开放知识库(开放kbs)的自动完成(开放kbs)是有用的,可用于发现可能直接存在文本中可能不直接存在的新事实。但是,到目前为止,开放KB完成(开放KBC)的研究仅限于像英语这样的资源丰富的语言。利用多语言开放IE的最新进展,我们构建了第一个称为MOKB6的多语言开放KBC数据集,其中包含Wikipedia的事实(包括英语)。通过执行多语言的核心分辨率并仅保留实体链接的三元组,我们改善了以前的开放KB施工管道,我们创建了一个密集的开放式KB。我们试验了几个模型来完成该任务,并观察到将语言借助共享的嵌入空间以及事实翻译结合的一致好处。我们还观察到,当前的多语言模型难以记住用不同脚本的语言看到的事实。
Automated completion of open knowledge bases (Open KBs), which are constructed from triples of the form (subject phrase, relation phrase, object phrase), obtained via open information extraction (Open IE) system, are useful for discovering novel facts that may not be directly present in the text. However, research in Open KB completion (Open KBC) has so far been limited to resource-rich languages like English. Using the latest advances in multilingual Open IE, we construct the first multilingual Open KBC dataset, called mOKB6, containing facts from Wikipedia in six languages (including English). Improving the previous Open KB construction pipeline by doing multilingual coreference resolution and keeping only entity-linked triples, we create a dense Open KB. We experiment with several models for the task and observe a consistent benefit of combining languages with the help of shared embedding space as well as translations of facts. We also observe that current multilingual models struggle to remember facts seen in languages of different scripts.