用句子嵌入来扩展多域语义分段

论文标题

用句子嵌入来扩展多域语义分段

Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings

论文作者

Yin, Wei, Liu, Yifan, Shen, Chunhua, Sun, Baichuan, Hengel, Anton van den

论文摘要

我们提出了一种语义细分的方法，该方法在零拍设置中应用时可以实现最先进的监督性能。因此，在每个主要的语义分割数据集上，它实现了相当于监督方法的结果，而无需在这些数据集上进行培训。这是通过用描述该类的简短段落的矢量值嵌入来代替每个类标签来实现的。这种方法的通用性和简单性使得可以从不同域中合并多个数据集，每个数据集都具有不同的类标签和语义。由超过200万张图像的结果合并的语义细分数据集使训练一个模型，该模型在7个基准数据集上的最新监督方法等于性能，尽管不使用其中的任何图像。通过对标准语义细分数据集进行微调模型，我们还分别对NYUD-V2和Pascal-Context的最先进的监督分段进行了显着改善，分别为60％和65％MIOU。基于语言嵌入的亲密关系，我们的方法甚至可以细分看不见的标签。广泛的实验证明了对看不见的图像域和看不见的标签的强烈概括，并且该方法可以改善下游应用程序的性能，包括深度估计和实例分段。

We propose an approach to semantic segmentation that achieves state-of-the-art supervised performance when applied in a zero-shot setting. It thus achieves results equivalent to those of the supervised methods, on each of the major semantic segmentation datasets, without training on those datasets. This is achieved by replacing each class label with a vector-valued embedding of a short paragraph that describes the class. The generality and simplicity of this approach enables merging multiple datasets from different domains, each with varying class labels and semantics. The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets, despite not using any images therefrom. By fine-tuning the model on standard semantic segmentation datasets, we also achieve a significant improvement over the state-of-the-art supervised segmentation on NYUD-V2 and PASCAL-context at 60% and 65% mIoU, respectively. Based on the closeness of language embeddings, our method can even segment unseen labels. Extensive experiments demonstrate strong generalization to unseen image domains and unseen labels, and that the method enables impressive performance improvements in downstream applications, including depth estimation and instance segmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题