论文标题
协作情报编排:半监督学习和积极学习的基于不一致的融合
Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of Semi-Supervised Learning and Active Learning
论文作者
论文摘要
虽然注释大量的数据满足复杂的学习模型,但对于许多现实世界中的应用程序可能会过于良好。主动学习(AL)和半监督学习(SSL)是两个有效但经常被隔离的方法,可以减轻渴望数据的问题。最近的一些研究探索了将AL和SSL相结合以更好地探测未标记数据的潜力。但是,几乎所有这些当代的SSL-AL作品都采用了简单的组合策略,而忽略了SSL和AL的固有关系。此外,其他方法在处理大型,高维数据集时会遭受高计算成本。通过行业标记数据的实践的激励,我们提出了一种基于创新的基于创新的不一致的虚拟虚拟对手活跃学习(理想)算法,以进一步研究SSL-AL的潜在优势,并实现AL和SSL的相互性增强,即对SSL的ssl,即SSL的标记信息,并为AL提供exorptions Al samseltions,并为AL提供范围,并提供exorptions Al的范围。 SSL。我们通过不同粒度的增强策略(包括细粒度的连续扰动探索和粗粒数据转换)来估计未标记的样品的不一致。在文本和图像域中,广泛的实验验证了所提出的算法的有效性,并将其与最先进的基线进行了比较。两项现实的案例研究可视化应用和部署所提出的数据采样算法的实际工业价值。
While annotating decent amounts of data to satisfy sophisticated learning models can be cost-prohibitive for many real-world applications. Active learning (AL) and semi-supervised learning (SSL) are two effective, but often isolated, means to alleviate the data-hungry problem. Some recent studies explored the potential of combining AL and SSL to better probe the unlabeled data. However, almost all these contemporary SSL-AL works use a simple combination strategy, ignoring SSL and AL's inherent relation. Further, other methods suffer from high computational costs when dealing with large-scale, high-dimensional datasets. Motivated by the industry practice of labeling data, we propose an innovative Inconsistency-based virtual aDvErsarial Active Learning (IDEAL) algorithm to further investigate SSL-AL's potential superiority and achieve mutual enhancement of AL and SSL, i.e., SSL propagates label information to unlabeled samples and provides smoothed embeddings for AL, while AL excludes samples with inconsistent predictions and considerable uncertainty for SSL. We estimate unlabeled samples' inconsistency by augmentation strategies of different granularities, including fine-grained continuous perturbation exploration and coarse-grained data transformations. Extensive experiments, in both text and image domains, validate the effectiveness of the proposed algorithm, comparing it against state-of-the-art baselines. Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithm.