少数民族班级的主动学习不平衡数据集

论文标题

少数民族班级的主动学习不平衡数据集

Minority Class Oriented Active Learning for Imbalanced Datasets

论文作者

Aggarwal, Umang, Popescu, Adrian, Hudelot, Céline

论文摘要

主动学习旨在在限制资源时优化数据集注释过程。大多数现有方法都是为平衡数据集设计的。他们的实际适用性受到大多数现实生活数据集实际情况不平衡的事实的限制。在这里，我们介绍了一种新的活跃学习方法，该方法是为不平衡数据集设计的。它有利于可能在少数群体中的样本，以减少标记的子集的失衡并为这些类别创建更好的表示。我们还比较了两种用于主动学习的培训方案：（1）一种通常在每次迭代的模型微调的深度积极学习中部署的培训方案，以及（2）受转移学习启发和利用通用的预训练的模型和训练每种迭代的浅层分类器的方案。评估是使用三个不平衡数据集进行的。结果表明，所提出的主动学习方法的表现优于竞争基准。同样有趣的是，他们还表明，如果功能可以从通用数据集传输到未标记的数据集，则转移学习训练方案的表现优于模型。最后的结果令人惊讶，应该鼓励社区探索深度积极学习方法的设计。

Active learning aims to optimize the dataset annotation process when resources are constrained. Most existing methods are designed for balanced datasets. Their practical applicability is limited by the fact that a majority of real-life datasets are actually imbalanced. Here, we introduce a new active learning method which is designed for imbalanced datasets. It favors samples likely to be in minority classes so as to reduce the imbalance of the labeled subset and create a better representation for these classes. We also compare two training schemes for active learning: (1) the one commonly deployed in deep active learning using model fine tuning for each iteration and (2) a scheme which is inspired by transfer learning and exploits generic pre-trained models and train shallow classifiers for each iteration. Evaluation is run with three imbalanced datasets. Results show that the proposed active learning method outperforms competitive baselines. Equally interesting, they also indicate that the transfer learning training scheme outperforms model fine tuning if features are transferable from the generic dataset to the unlabeled one. This last result is surprising and should encourage the community to explore the design of deep active learning methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题