论文标题
要求人群工写入示例:最好的不良选择
Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options
论文作者
论文摘要
大规模的自然语言推断(NLI)数据集(例如SNLI或MNLI)是通过要求人群工人阅读前提并写出三个新的假设而创建的,一个是针对每个可能的语义关系(构成,矛盾和中立)的。虽然该协议已用于创建有用的基准数据,但尚不清楚基于写作的注释协议是否对任何目的都是最佳的,因为尚未直接对其进行评估。此外,有足够的证据表明人群写作可以在数据中引入文物。我们研究了两个替代方案,它们会自动创建候选者(前提,假设)对标记的注释。使用这些协议和基于写作的基准,我们收集了几个新的英语NLI数据集,每个数据集每个都超过3K示例,每个数据集使用固定数量的注释时间,但有大量示例以适合该时间预算。我们对NLI和转移学习的实验表现出负面的结果:在NLI内部或转移到外部目标任务时,尚无替代方案的表现优于基线。我们得出的结论是,人群写作仍然是索取数据的最著名选项,突出了需要进一步的数据收集工作,以专注于改善基于写作的注释过程。
Large-scale natural language inference (NLI) datasets such as SNLI or MNLI have been created by asking crowdworkers to read a premise and write three new hypotheses, one for each possible semantic relationships (entailment, contradiction, and neutral). While this protocol has been used to create useful benchmark data, it remains unclear whether the writing-based annotation protocol is optimal for any purpose, since it has not been evaluated directly. Furthermore, there is ample evidence that crowdworker writing can introduce artifacts in the data. We investigate two alternative protocols which automatically create candidate (premise, hypothesis) pairs for annotators to label. Using these protocols and a writing-based baseline, we collect several new English NLI datasets of over 3k examples each, each using a fixed amount of annotator time, but a varying number of examples to fit that time budget. Our experiments on NLI and transfer learning show negative results: None of the alternative protocols outperforms the baseline in evaluations of generalization within NLI or on transfer to outside target tasks. We conclude that crowdworker writing still the best known option for entailment data, highlighting the need for further data collection work to focus on improving writing-based annotation processes.