论文标题
NEMO:指导和上下文化交互式数据编程的弱监督
Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming
论文作者
论文摘要
弱监督(WS)技术允许用户通过用启发式监督来源对数据进行编程标记,从而有效地创建了大型培训数据集。尽管WS的成功在很大程度上依赖于提供的标签启发式方法,但实际上如何创建这些启发式方法的过程仍未得到探索。在这项工作中,我们将将启发式方法标记为互动过程的开发过程正式化,该过程围绕现有工作流程构建,用户从选定的开发数据集中汲取了想法,以设计启发式源。借助形式主义,我们研究了如何从战略上选择开发数据的两个核心问题,以指导用户有效地创建信息启发式信息,以及如何利用开发过程中的信息,以使上下文化和更好地从生成的启发式方法中学习。基于两种有效解决相应问题的新方法,我们提出了Nemo,Nemo是一种端到端的交互式系统,将WS学习管道的整体生产率提高了20%(一项任务中最高47%),而不是普遍的WS方法。
Weak Supervision (WS) techniques allow users to efficiently create large training datasets by programmatically labeling data with heuristic sources of supervision. While the success of WS relies heavily on the provided labeling heuristics, the process of how these heuristics are created in practice has remained under-explored. In this work, we formalize the development process of labeling heuristics as an interactive procedure, built around the existing workflow where users draw ideas from a selected set of development data for designing the heuristic sources. With the formalism, we study two core problems of how to strategically select the development data to guide users in efficiently creating informative heuristics, and how to exploit the information within the development process to contextualize and better learn from the resultant heuristics. Building upon two novel methodologies that effectively tackle the respective problems considered, we present Nemo, an end-to-end interactive system that improves the overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS approach.