组成零识别的因果视图

论文标题

组成零识别的因果视图

A causal view of compositional zero-shot recognition

论文作者

Atzmon, Yuval, Kreuk, Felix, Shalit, Uri, Chechik, Gal

论文摘要

人们很容易识别出新的视觉类别，这些视觉类别是已知组件的新组合。这种组成的概括能力对于在视觉和语言等现实世界中学习至关重要，因为新组合的长尾巴占主导地位。不幸的是，学习系统与构图概括相比，因为它们通常基于与班级标签相关的功能，即使它们对班级不是必不可少的。这会导致对新分布的样品的一致分类，例如已知组件的新组合。在这里，我们描述了一种基于因果观念的组成概括方法。首先，我们从因果的角度描述了零拍的组成零，并提议将零射击推断视为“发现哪种干预造成了图像？”。其次，我们提出了一个受因果风格的嵌入模型，该模型从相关（混杂）训练数据中学习视觉对象的基本组件的分离表示。我们在两个数据集上评估了这种方法，以预测属性对象对的新组合：一个控制良好的综合图像数据集和一个由精细晶体类型的鞋子组成的真实数据集。与强质基线相比，我们显示出改进。

People easily recognize new visual categories that are new combinations of known components. This compositional generalization capacity is critical for learning in real-world domains like vision and language because the long tail of new combinations dominates the distribution. Unfortunately, learning systems struggle with compositional generalization because they often build on features that are correlated with class labels even if they are not "essential" for the class. This leads to consistent misclassification of samples from a new distribution, like new combinations of known components. Here we describe an approach for compositional generalization that builds on causal ideas. First, we describe compositional zero-shot learning from a causal perspective, and propose to view zero-shot inference as finding "which intervention caused the image?". Second, we present a causal-inspired embedding model that learns disentangled representations of elementary components of visual objects from correlated (confounded) training data. We evaluate this approach on two datasets for predicting new combinations of attribute-object pairs: A well-controlled synthesized images dataset and a real-world dataset which consists of fine-grained types of shoes. We show improvements compared to strong baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题