论文标题
基于池的批处理积极学习的快速率
Fast Rates in Pool-Based Batch Active Learning
论文作者
论文摘要
我们考虑了一个批处理的主动学习方案,其中学习者可以适应标签的甲骨文批处理。由于具有标签Oracle(通常是人类)的交互作用数量较少,因此在实践中采样标签在实践中是非常可取的。但是,批处理主动学习通常会支付降低的适应性的价格,从而导致次优结果。在本文中,我们提出了一种解决方案,该解决方案需要在查询点的信息性及其多样性之间进行仔细的权衡。我们从理论上研究了在实际相关的情况下进行批处理主动学习,其中未标记的数据池事先可用({\ em池基}活动学习)。我们分析了一种新颖的阶段贪婪算法,并表明,作为标签复杂性的函数,该算法的多余风险与标准统计学习环境中已知的最小值率相匹配。我们的结果还表现出对批处理大小的温和依赖。这些是在信息性和多样性之间进行仔细的交易,以严格量化基于池的情况下批处理主动学习的统计性能,这些结果是第一个理论上的结果。
We consider a batch active learning scenario where the learner adaptively issues batches of points to a labeling oracle. Sampling labels in batches is highly desirable in practice due to the smaller number of interactive rounds with the labeling oracle (often human beings). However, batch active learning typically pays the price of a reduced adaptivity, leading to suboptimal results. In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity. We theoretically investigate batch active learning in the practically relevant scenario where the unlabeled pool of data is available beforehand ({\em pool-based} active learning). We analyze a novel stage-wise greedy algorithm and show that, as a function of the label complexity, the excess risk of this algorithm matches the known minimax rates in standard statistical learning settings. Our results also exhibit a mild dependence on the batch size. These are the first theoretical results that employ careful trade offs between informativeness and diversity to rigorously quantify the statistical performance of batch active learning in the pool-based scenario.