论文标题
Condaqa:用于否定推理的对比阅读理解数据集
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation
论文作者
论文摘要
没有否定,无法实现基于人类语言的沟通的全部力量。所有人类语言都有某种形式的否定形式。尽管如此,对于当前的自然语言理解系统来说,否定仍然是一个具有挑战性的现象。为了促进可以有效处理否定的模型的未来开发,我们提出了Condaqa,这是第一个英语阅读理解数据集,它需要关于段落中否定陈述的含义的推理。我们收集具有各种否定线索的段落,然后让众议员提出有关否定陈述在段落中的含义的问题。我们还让工人对段落进行了三种编辑 - 释义否定的陈述,改变否定的范围并逆转否定 - 导致了一个问答对的群集,这些群体很难用伪造的捷径来回答模型。 Condaqa具有14,182个问答对,具有200多个独特的否定线索,对于当前最新模型而言是具有挑战性的。在Condaqa(UnifiedQA-V2-3B)上的最佳性能模型在我们的一致性度量上仅能达到42%,远低于人类绩效,为81%。我们发布数据集,以及全面的,很少的射击和零射门评估,以促进开发未来在否定语言上使用的NLP方法。
The full power of human language-based communication cannot be realized without negation. All human languages have some form of negation. Despite this, negation remains a challenging phenomenon for current natural language understanding systems. To facilitate the future development of models that can process negation effectively, we present CONDAQA, the first English reading comprehension dataset which requires reasoning about the implications of negated statements in paragraphs. We collect paragraphs with diverse negation cues, then have crowdworkers ask questions about the implications of the negated statement in the passage. We also have workers make three kinds of edits to the passage -- paraphrasing the negated statement, changing the scope of the negation, and reversing the negation -- resulting in clusters of question-answer pairs that are difficult for models to answer with spurious shortcuts. CONDAQA features 14,182 question-answer pairs with over 200 unique negation cues and is challenging for current state-of-the-art models. The best performing model on CONDAQA (UnifiedQA-v2-3b) achieves only 42% on our consistency metric, well below human performance which is 81%. We release our dataset, along with fully-finetuned, few-shot, and zero-shot evaluations, to facilitate the development of future NLP methods that work on negated language.