论文标题

在文档筛选中向人群学习积极学习

Active Learning from Crowd in Document Screening

论文作者

Krivosheev, Evgeny, Sayin, Burcu, Bozzon, Alessandro, Szlávik, Zoltán

论文摘要

在本文中,我们探讨了如何有效地将众包和机器智能结合在一起,以解决文档筛选问题,我们需要在其中使用一组机器学习过滤器进行筛选文档。具体来说,我们专注于构建一组评估文档的机器学习分类器,然后有效筛选它们。这是一项具有挑战性的任务,因为预算有限,并且有无数方法将给定预算用于问题。我们提出了一种多标签的主动学习筛选特定的采样技术 - 客观意识采样 - 用于查询未标记文档以注释。我们的算法决定哪种机器过滤器需要更多的培训数据以及如何选择未标记的项目以注释以最大程度地减少整体分类错误的风险,而不是最大程度地减少单个过滤器错误。我们证明,客观意识的抽样显着优于艺术活跃的学习抽样策略的状态。

In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a challenging task since the budget is limited and there are countless number of ways to spend the given budget on the problem. We propose a multi-label active learning screening specific sampling technique -- objective-aware sampling -- for querying unlabelled documents for annotating. Our algorithm takes a decision on which machine filter need more training data and how to choose unlabeled items to annotate in order to minimize the risk of overall classification errors rather than minimizing a single filter error. We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源