论文标题
解释表示形式的对比语料库归因
Contrastive Corpus Attribution for Explaining Representations
论文作者
论文摘要
尽管无监督的模型广泛使用,但很少有人设计用于解释它们的方法。大多数解释方法解释了标量模型输出。但是,无监督的模型输出表示向量,其元素不是可以解释的好候选者,因为它们缺乏语义含义。为了弥合这一差距,最近的作品定义了标量解释输出:在表示空间中,基于DOT产品的相似性与要解释的样本(即阐释)。尽管这种能力解释了无监督的模型,但这种方法的解释仍然可以是不透明的,因为与阐明的表达相似,可能对人类没有意义。为了解决这个问题,我们提出了基于参考语料库和对比鲜明的样品集的新颖和语义有意义的标量解释输出。我们证明,对比语料库相似性与许多事后特征归因方法兼容,以生成对比度语料库归因(可可),并定量验证是否识别出对语料库重要的特征。我们通过两种方式展示了可可的实用性:(i)我们通过在对比度学习环境中解释同一图像的增强来吸引见解(SIMCLR); (ii)我们通过解释图像表示与共同学习的文本表示(剪辑)的相似性来执行零击对象定位。
Despite the widespread use of unsupervised models, very few methods are designed to explain them. Most explanation methods explain a scalar model output. However, unsupervised models output representation vectors, the elements of which are not good candidates to explain because they lack semantic meaning. To bridge this gap, recent works defined a scalar explanation output: a dot product-based similarity in the representation space to the sample being explained (i.e., an explicand). Although this enabled explanations of unsupervised models, the interpretation of this approach can still be opaque because similarity to the explicand's representation may not be meaningful to humans. To address this, we propose contrastive corpus similarity, a novel and semantically meaningful scalar explanation output based on a reference corpus and a contrasting foil set of samples. We demonstrate that contrastive corpus similarity is compatible with many post-hoc feature attribution methods to generate COntrastive COrpus Attributions (COCOA) and quantitatively verify that features important to the corpus are identified. We showcase the utility of COCOA in two ways: (i) we draw insights by explaining augmentations of the same image in a contrastive learning setting (SimCLR); and (ii) we perform zero-shot object localization by explaining the similarity of image representations to jointly learned text representations (CLIP).