论文标题
DEXA:通过专家的动态示例支持非专家注释者
DEXA: Supporting Non-Expert Annotators with Dynamic Examples from Experts
论文作者
论文摘要
基于众包文本语料库注释的成功取决于确保人群工作者足够良好的训练以准确执行注释任务。为此,训练注释者的常见方法是提供指令和一些示例案例,以证明应如何执行任务(称为控制方法)。但是,这些全球定义的“任务级示例”,(i)通常仅涵盖注释任务中遇到的常见案例; (ii)在注释过程中需要拥挤者的努力,以找到当前注释样本的最相关示例。为了克服这些局限性,我们建议除任务级别的示例外支持工人,以及与当前注释的数据示例相似的“任务 - 实体级别”示例(称为注释的动态示例,DEXA)。这种动态示例可以从先前由专家标记的收藏中检索,这些示例通常可作为黄金标准数据集可用。我们在医学研究句子中评估了注释参与者,干预和结果(称为PIO)的复杂任务的DEXA。使用BioSent2Vec检索动态示例,这是一种针对生物医学领域的无监督语义句子相似性方法。结果表明,(i)DEXA方法的工人平均达成了比控制方法的工人(Cohen's Kappa)更高的协议(Cohen's Kappa)(对DEXA的专家,Dexa的专家vs. 0.40的0.68 vs. 0.40); (ii)DEXA方法的总数总数已经有3个投票汇总的注释,对P/I/O的专家达成了实质性协议(在控制0.73/0.58/0.46中)。最后,(iii)我们从工人那里获得明确的反馈,并表明在大多数情况下(平均72%),工人发现动态示例有用。
The success of crowdsourcing based annotation of text corpora depends on ensuring that crowdworkers are sufficiently well-trained to perform the annotation task accurately. To that end, a frequent approach to train annotators is to provide instructions and a few example cases that demonstrate how the task should be performed (referred to as the CONTROL approach). These globally defined "task-level examples", however, (i) often only cover the common cases that are encountered during an annotation task; and (ii) require effort from crowdworkers during the annotation process to find the most relevant example for the currently annotated sample. To overcome these limitations, we propose to support workers in addition to task-level examples, also with "task-instance level" examples that are semantically similar to the currently annotated data sample (referred to as Dynamic Examples for Annotation, DEXA). Such dynamic examples can be retrieved from collections previously labeled by experts, which are usually available as gold standard dataset. We evaluate DEXA on a complex task of annotating participants, interventions, and outcomes (known as PIO) in sentences of medical studies. The dynamic examples are retrieved using BioSent2Vec, an unsupervised semantic sentence similarity method specific to the biomedical domain. Results show that (i) workers of the DEXA approach reach on average much higher agreements (Cohen's Kappa) to experts than workers of the the CONTROL approach (avg. of 0.68 to experts in DEXA vs. 0.40 in CONTROL); (ii) already three per majority voting aggregated annotations of the DEXA approach reach substantial agreements to experts of 0.78/0.75/0.69 for P/I/O (in CONTROL 0.73/0.58/0.46). Finally, (iii) we acquire explicit feedback from workers and show that in the majority of cases (avg. 72%) workers find the dynamic examples useful.