干草堆中的针头：极端阶级不平衡的标签评估

论文标题

干草堆中的针头：极端阶级不平衡的标签评估

Needle in a Haystack: Label-Efficient Evaluation under Extreme Class Imbalance

论文作者

Marchant, Neil G., Rubinstein, Benjamin I. P.

论文摘要

诸如记录联系和极端分类之类的重要任务表明了极端的阶级失衡，每100万或更多的实例中有1个少数族裔实例。获得足够的所有类别的样本，甚至仅仅是为了实现统计学上的评估，这是如此具有挑战性，以至于大多数当前的方法都产生较差的估计或不切实际的成本。如果对这一挑战征收重要性采样，则对绩效指标进行了限制性限制，估计不提供适当的保证，或者评估无法适应传入的标签。本文开发了一个基于自适应重要性抽样的在线评估的框架。给定$ p（y | x）$的目标性能指标和模型，该框架适应了对项目进行标记的分布，以最大程度地提高统计精度。我们建立了强大的一致性，并为由此产生的性能估计值建立了中心限制定理，并通过利用Dirichlet-Tree模型的工作示例实例化了我们的框架。实验表明，在固定标签预算上，平均MSE优于最先进的MSE。

Important tasks like record linkage and extreme classification demonstrate extreme class imbalance, with 1 minority instance to every 1 million or more majority instances. Obtaining a sufficient sample of all classes, even just to achieve statistically-significant evaluation, is so challenging that most current approaches yield poor estimates or incur impractical cost. Where importance sampling has been levied against this challenge, restrictive constraints are placed on performance metrics, estimates do not come with appropriate guarantees, or evaluations cannot adapt to incoming labels. This paper develops a framework for online evaluation based on adaptive importance sampling. Given a target performance metric and model for $p(y|x)$, the framework adapts a distribution over items to label in order to maximize statistical precision. We establish strong consistency and a central limit theorem for the resulting performance estimates, and instantiate our framework with worked examples that leverage Dirichlet-tree models. Experiments demonstrate an average MSE superior to state-of-the-art on fixed label budgets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题