论文标题

主动数据发现:使用suppoular信息测量方法挖掘未知数据

Active Data Discovery: Mining Unknown Data using Submodular Information Measures

论文作者

Kothawade, Suraj, Chopra, Shivang, Ghosh, Saikat, Iyer, Rishabh

论文摘要

积极学习是一个非常常见但功能强大的框架,用于与人类在循环中的人类迭代和自适应采样子集,目的是实现标签效率。大多数现实世界数据集在类和切片中都有不平衡,并且相应的数据集很少见。结果,在设计挖掘这些罕见数据实例的主动学习方法方面已经有很多工作。大多数方法都假设访问包含这些罕见数据实例的一组种子实例。但是,在更极端的稀有度中,可以合理地假设这些罕见的数据实例(类别或切片)甚至可能在标记的种子集合中存在,并且对主动学习范式的关键需求是有效地发现这些罕见的数据实例。在这项工作中,我们提供一个主动的数据发现框架,可以使用子模块的条件增益和supperdular条件相互信息功能有效地挖掘未知的数据切片和类。我们提供了一个一般的算法框架,该框架在许多情况下都起作用,包括图像分类和对象检测,并与未标记的集合中存在的罕见类别和稀有切片一起使用。与现有的最新活跃学习方法相比,我们的方法表现出明显的准确性和标记效率提高,以积极发现这些稀有类别和切片。

Active Learning is a very common yet powerful framework for iteratively and adaptively sampling subsets of the unlabeled sets with a human in the loop with the goal of achieving labeling efficiency. Most real world datasets have imbalance either in classes and slices, and correspondingly, parts of the dataset are rare. As a result, there has been a lot of work in designing active learning approaches for mining these rare data instances. Most approaches assume access to a seed set of instances which contain these rare data instances. However, in the event of more extreme rareness, it is reasonable to assume that these rare data instances (either classes or slices) may not even be present in the seed labeled set, and a critical need for the active learning paradigm is to efficiently discover these rare data instances. In this work, we provide an active data discovery framework which can mine unknown data slices and classes efficiently using the submodular conditional gain and submodular conditional mutual information functions. We provide a general algorithmic framework which works in a number of scenarios including image classification and object detection and works with both rare classes and rare slices present in the unlabeled set. We show significant accuracy and labeling efficiency gains with our approach compared to existing state-of-the-art active learning approaches for actively discovering these rare classes and slices.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源