论文标题

aetows-bench-101:用100个标签的自动弱监督基准测试

AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels

论文作者

Roberts, Nicholas, Li, Xintong, Huang, Tzu-Heng, Adila, Dyah, Schoenberg, Spencer, Liu, Cheng-Yu, Pick, Lauren, Ma, Haotian, Albarghouthi, Aws, Sala, Frederic

论文摘要

弱监督(WS)是构建标有标签的数据集的强大方法,面对几乎没有标记的数据,用于培训监督模型。它用标记功能(LFS)表达的多个嘈杂但廉价的标签估计值代替了手持标签数据。尽管它已成功地用于许多域中,但弱监督的应用程序范围受到构造具有复杂或高维特征域的标记功能的困难的限制。为了解决这个问题,少数方法提出了使用一小部分地面真实标签自动化LF设计过程的方法。在这项工作中,我们介绍了aettos-bench-101:在挑战WS设置中评估自动化WS(autows)技术的框架 - 以前很难或不可能应用传统的WS技术是一组不同的应用程序域。虽然AtoW是扩大WS应用程序范围的有希望的方向,但诸如零射击基础模型之类的强大方法的出现揭示了需要了解介绍技术如何与现代零击或几次学习者进行比较或合作。这为autows bench-101的中心问题提供了信息:给定每个任务的初始标签集,我们询问从业者是否应使用自动方法来生成其他标签或使用一些更简单的基线,例如来自基础模型或监督学习的零摄像预测。我们观察到,在许多设置中,如果介绍方法要优于简单的几个射击基线,则必须将方法纳入基础模型的信号,并且autows-bess-101促进了朝这个方向促进未来的研究。我们以详尽的介绍方法进行了彻底消融研究。

Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of constructing labeling functions for domains with complex or high-dimensional features. To address this, a handful of methods have proposed automating the LF design process using a small set of ground truth labels. In this work, we introduce AutoWS-Bench-101: a framework for evaluating automated WS (AutoWS) techniques in challenging WS settings -- a set of diverse application domains on which it has been previously difficult or impossible to apply traditional WS techniques. While AutoWS is a promising direction toward expanding the application-scope of WS, the emergence of powerful methods such as zero-shot foundation models reveals the need to understand how AutoWS techniques compare or cooperate with modern zero-shot or few-shot learners. This informs the central question of AutoWS-Bench-101: given an initial set of 100 labels for each task, we ask whether a practitioner should use an AutoWS method to generate additional labels or use some simpler baseline, such as zero-shot predictions from a foundation model or supervised learning. We observe that in many settings, it is necessary for AutoWS methods to incorporate signal from foundation models if they are to outperform simple few-shot baselines, and AutoWS-Bench-101 promotes future research in this direction. We conclude with a thorough ablation study of AutoWS methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源