使用自动贝叶斯先验选择的正规化数据编程

论文标题

使用自动贝叶斯先验选择的正规化数据编程

Regularized Data Programming with Automated Bayesian Prior Selection

论文作者

Maasch, Jacqueline R. M. A., Zhang, Hao, Yang, Qian, Wang, Fei, Kuleshov, Volodymyr

论文摘要

手动数据标签的成本可能是监督学习的重要障碍。数据编程（DP）为培训数据集创建提供了一个弱监督的解决方案，其中用户定义的程序化标记功能（LFS）的输出通过无监督的学习来调和。但是，在某些情况下，包括低数据的环境，DP无法超过未加权的多数票。这项工作引入了经典DP的贝叶斯扩展，通过使用正则化项来增强DP目标，从而减轻了无监督学习的失败。正则学习是通过最大程度的后验估计来实现的。大多数投票被认为是自动化先验参数选择的代理信号。结果表明，正规的DP相对于最大可能性和多数投票，赋予更大的解释性以及在低数据制度中的绩效提高性能。

The cost of manual data labeling can be a significant obstacle in supervised learning. Data programming (DP) offers a weakly supervised solution for training dataset creation, wherein the outputs of user-defined programmatic labeling functions (LFs) are reconciled through unsupervised learning. However, DP can fail to outperform an unweighted majority vote in some scenarios, including low-data contexts. This work introduces a Bayesian extension of classical DP that mitigates failures of unsupervised learning by augmenting the DP objective with regularization terms. Regularized learning is achieved through maximum a posteriori estimation with informative priors. Majority vote is proposed as a proxy signal for automated prior parameter selection. Results suggest that regularized DP improves performance relative to maximum likelihood and majority voting, confers greater interpretability, and bolsters performance in low-data regimes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题