论文标题
WEANF:正常流量的弱监督
WeaNF: Weak Supervision with Normalizing Flows
论文作者
论文摘要
一种流行的方法来减少大型数据集的昂贵手动注释的需求是弱监督,这引入了嘈杂标签,覆盖范围和偏见的问题。克服这些问题的方法要么依赖于判别模型,该模型培训了特定于弱监督的成本功能,以及最近的生成模型,试图模拟自动注释过程的输出。在这项工作中,我们探索了用于弱监督的新型生成建模方向:而不是对注释过程的输出进行建模(标签函数匹配),而是在标记函数涵盖的输入端数据分布(特征空间)中进行了建模。具体而言,我们通过使用标准化流量来估计每个弱标记源或标记函数的密度。我们方法的一个不可或缺的一部分是基于流量的多个同时匹配标记函数的建模,因此捕获了诸如标记函数重叠和相关性之类的现象。我们分析了各种常用较弱的监督数据集的有效性和建模功能,并表明弱监督的归一化流量与标准弱监督基线相比有利。
A popular approach to decrease the need for costly manual annotation of large data sets is weak supervision, which introduces problems of noisy labels, coverage and bias. Methods for overcoming these problems have either relied on discriminative models, trained with cost functions specific to weak supervision, and more recently, generative models, trying to model the output of the automatic annotation process. In this work, we explore a novel direction of generative modeling for weak supervision: Instead of modeling the output of the annotation process (the labeling function matches), we generatively model the input-side data distributions (the feature space) covered by labeling functions. Specifically, we estimate a density for each weak labeling source, or labeling function, by using normalizing flows. An integral part of our method is the flow-based modeling of multiple simultaneously matching labeling functions, and therefore phenomena such as labeling function overlap and correlations are captured. We analyze the effectiveness and modeling capabilities on various commonly used weak supervision data sets, and show that weakly supervised normalizing flows compare favorably to standard weak supervision baselines.