论文标题
通过覆盖镜头积极学习
Active Learning Through a Covering Lens
论文作者
论文摘要
深入的积极学习旨在减少培训深层模型的注释成本,这是渴望数据的。直到最近,在低预算制度中,深度积极的学习方法是无效的,那里只有少量的示例被注释。最近的代表和自我监督学习的进展缓解了这种情况,这些学习赋予了数据表示的几何形状,并提供了有关这些要点的丰富信息。利用这一进步,我们研究了通过“覆盖”镜头(提出可能的概率)的子集选择问题,这是一种新的活跃学习算法,用于低预算制度,该算法旨在最大程度地提高概率覆盖范围。然后,我们描述了一种观看所提出的配方的双重方法,从中可以从中获得适合于现有方法(例如CoreSet)相关的高预算制度的策略。我们以广泛的实验结束,评估低预算制度中的概率。我们表明,我们原则上的主动学习策略改善了几种图像识别基准的低预算制度中的最先进。该方法在半监督的设置中特别有益,允许最先进的半监督方法与完全监督方法的性能匹配,同时使用较少的标签。代码可在https://github.com/avihu111/typiclust上找到。
Deep active learning aims to reduce the annotation cost for the training of deep models, which is notoriously data-hungry. Until recently, deep active learning methods were ineffectual in the low-budget regime, where only a small number of examples are annotated. The situation has been alleviated by recent advances in representation and self-supervised learning, which impart the geometry of the data representation with rich information about the points. Taking advantage of this progress, we study the problem of subset selection for annotation through a "covering" lens, proposing ProbCover - a new active learning algorithm for the low budget regime, which seeks to maximize Probability Coverage. We then describe a dual way to view the proposed formulation, from which one can derive strategies suitable for the high budget regime of active learning, related to existing methods like Coreset. We conclude with extensive experiments, evaluating ProbCover in the low-budget regime. We show that our principled active learning strategy improves the state-of-the-art in the low-budget regime in several image recognition benchmarks. This method is especially beneficial in the semi-supervised setting, allowing state-of-the-art semi-supervised methods to match the performance of fully supervised methods, while using much fewer labels nonetheless. Code is available at https://github.com/avihu111/TypiClust.