论文标题

IIRC:不完整信息阅读理解问题的数据集

IIRC: A Dataset of Incomplete Information Reading Comprehension Questions

论文作者

Ferguson, James, Gardner, Matt, Hajishirzi, Hannaneh, Khot, Tushar, Dasigi, Pradeep

论文摘要

人类通常必须阅读多个文档来满足其信息需求。但是,大多数现有的阅读理解(RC)任务仅关注上下文提供答案所需的所有信息的问题,因此无法评估系统的性能,以确定潜在的缺乏足够的信息并找到该信息的来源。为了填补此空白,我们向IIRC提供了一个数据集,对英语Wikipedia的段落进行了超过13k的问题,这些问题仅提供部分信息来回答它们,其中丢失的信息发生在一个或多个链接的文档中。这些问题是由无法访问任何链接文件的人群工人撰写的,这导致问题与答案出现的上下文几乎没有词汇重叠。这个过程还提供了许多没有答案的问题,以及那些需要离散推理的问题,从而增加了任务的困难。我们遵循有关各种阅读理解数据集的最新建模工作,以构建该数据集的基线模型,发现它在此任务上达到了31.1%的F1,而估计的人类绩效为88.4%。可以在https://allennlp.org/iirc上找到数据集,基线系统的代码和排行榜。

Humans often have to read multiple documents to address their information needs. However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a system's performance at identifying a potential lack of sufficient information and locating sources for that information. To fill this gap, we present a dataset, IIRC, with more than 13K questions over paragraphs from English Wikipedia that provide only partial information to answer them, with the missing information occurring in one or more linked documents. The questions were written by crowd workers who did not have access to any of the linked documents, leading to questions that have little lexical overlap with the contexts where the answers appear. This process also gave many questions without answers, and those that require discrete reasoning, increasing the difficulty of the task. We follow recent modeling work on various reading comprehension datasets to construct a baseline model for this dataset, finding that it achieves 31.1% F1 on this task, while estimated human performance is 88.4%. The dataset, code for the baseline system, and a leaderboard can be found at https://allennlp.org/iirc.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源