Jaquad：日语问题回答用于机器阅读理解的数据集

论文标题

Jaquad：日语问题回答用于机器阅读理解的数据集

JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension

论文作者

So, ByungHoon, Byun, Kyuhong, Kang, Kyungwon, Cho, Seongjin

论文摘要

问题回答（QA）是一项机器了解给定文档的任务，并且可以找到答案的问题。尽管在NLP领域取得了令人印象深刻的进展，但质量寻常仍然是一个具有挑战性的问题，尤其是由于缺乏带注释的数据集而对于非英语语言。在本文中，我们提出了日本问题回答数据集Jaquad的问题，该数据集由人类注释。 Jaquad由日本Wikipedia文章上的39,696个提取提问对组成。我们对基线模型进行了填补，该模型在F1分数中获得78.92％，而EM在测试集上获得了63.38％。数据集和我们的实验可在https://github.com/skelterlabsinc/jaquad上找到。

Question Answering (QA) is a task in which a machine understands a given document and a question to find an answer. Despite impressive progress in the NLP area, QA is still a challenging problem, especially for non-English languages due to the lack of annotated datasets. In this paper, we present the Japanese Question Answering Dataset, JaQuAD, which is annotated by humans. JaQuAD consists of 39,696 extractive question-answer pairs on Japanese Wikipedia articles. We finetuned a baseline model which achieves 78.92% for F1 score and 63.38% for EM on test set. The dataset and our experiments are available at https://github.com/SkelterLabsInc/JaQuAD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题