论文标题
量化在表示学习中出现的视觉概念的可学习性和描述性
Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning
论文作者
论文摘要
黑匣子模型,尤其是无监督的模型的影响越来越多,对了解和解释它们的工具越来越兴趣。在本文中,我们特别考虑如何从深层神经网络自动发现的视觉分组,从最新的聚类方法开始。在某些情况下,群集很容易对应于现有标记的数据集。但是,他们经常没有,但他们仍然保持“直观的解释性”。我们介绍了两个概念,即可学习性和描述性,可用于量化任意图像分组的解释性,包括无监督。这个想法是衡量(1)人类如何通过衡量他们从一小部分视觉示例(可学习性)和(2)视觉示例中概括的能力来学习分组的能力,是否可以用简洁的,文本描述(描述性)代替。通过评估人类注释作为分类器,我们消除了现有评估指标的主观质量。为了获得更好的可扩展性,我们最终提出了一个类级字幕系统,以自动生成视觉分组的描述,并使用描述性度量将其与人类注释者进行比较。
The increasing impact of black box models, and particularly of unsupervised ones, comes with an increasing interest in tools to understand and interpret them. In this paper, we consider in particular how to characterise visual groupings discovered automatically by deep neural networks, starting with state-of-the-art clustering methods. In some cases, clusters readily correspond to an existing labelled dataset. However, often they do not, yet they still maintain an "intuitive interpretability". We introduce two concepts, visual learnability and describability, that can be used to quantify the interpretability of arbitrary image groupings, including unsupervised ones. The idea is to measure (1) how well humans can learn to reproduce a grouping by measuring their ability to generalise from a small set of visual examples (learnability) and (2) whether the set of visual examples can be replaced by a succinct, textual description (describability). By assessing human annotators as classifiers, we remove the subjective quality of existing evaluation metrics. For better scalability, we finally propose a class-level captioning system to generate descriptions for visual groupings automatically and compare it to human annotators using the describability metric.