论文标题
迈向图像分类模型中更严格的盲点发现科学
Towards a More Rigorous Science of Blindspot Discovery in Image Classification Models
论文作者
论文摘要
越来越多的工作研究盲点发现方法(“ BDM” S):使用图像嵌入的方法在语义上有意义(即,由人为理解的概念统一)数据的数据,其中图像分类器的性能明显更糟。在先前工作中观察到的差距的激励中,我们引入了一个新的框架,用于评估BDMS,Spotcheck,该框架使用合成图像数据集使用已知的盲点和新的BDM训练模型,而新的BDM则使用2D图像表示。我们使用SpotCheck进行受控实验,以识别影响BDM性能的因素(例如,模型中的盲点数量或用于定义盲点的功能),并表明Planespot与现有的BDM相比具有竞争力,并且在许多情况下都超过了BDM。重要的是,我们通过设计使用来自MS-Coco的实际图像数据的其他实验来验证这些发现,这是一个大图像基准数据集。我们的发现提出了几个有希望的方向,用于将来在BDM设计和评估方面的工作。总体而言,我们希望这项工作中介绍的方法和分析将有助于促进更严格的盲点科学科学。
A growing body of work studies Blindspot Discovery Methods ("BDM"s): methods that use an image embedding to find semantically meaningful (i.e., united by a human-understandable concept) subsets of the data where an image classifier performs significantly worse. Motivated by observed gaps in prior work, we introduce a new framework for evaluating BDMs, SpotCheck, that uses synthetic image datasets to train models with known blindspots and a new BDM, PlaneSpot, that uses a 2D image representation. We use SpotCheck to run controlled experiments that identify factors that influence BDM performance (e.g., the number of blindspots in a model, or features used to define the blindspot) and show that PlaneSpot is competitive with and in many cases outperforms existing BDMs. Importantly, we validate these findings by designing additional experiments that use real image data from MS-COCO, a large image benchmark dataset. Our findings suggest several promising directions for future work on BDM design and evaluation. Overall, we hope that the methodology and analyses presented in this work will help facilitate a more rigorous science of blindspot discovery.