论文标题
WAD-CMSN:基于零拍的图像检索的基于Wasserstein距离距离的跨模式语义网络
WAD-CMSN: Wasserstein Distance based Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
论文作者
论文摘要
作为流行研究的计算机视觉分支,基于零拍的图像检索(ZSSBIR)最近引起了广泛的关注。与基于草图的图像检索(SBIR)不同,ZSSBIR的主要目的是取回自然图像,给定训练过程中可能不会出现的免费手绘草图。先前的方法使用语义对齐的草图图像对或利用内存昂贵的融合层将视觉信息投影到低维子空间,这忽略了高度抽象的草图和相关图像之间的显着异质跨域差异。在训练阶段,这可能会产生较差的表现。为了解决这个问题并克服这一缺点,我们建议针对ZSSBIR的基于Wasserstein距离的跨模式语义网络(WAD-CMSN)。具体而言,它首先以对抗性训练方式通过Wasserstein距离将每个分支(草图,图像)的视觉信息(草图,图像)传播到一个常见的低维语义子空间。此外,采用身份匹配损失来选择有用的特征,这不仅可以捕获完整的语义知识,而且还可以减轻WAD-CMSN模型引起的过度拟合现象。关于具有挑战性的粗略(扩展)和Tu-Berlin(扩展)数据集的实验结果表明,所提出的WAD-CMSN模型对几个竞争对手的有效性。
Zero-shot sketch-based image retrieval (ZSSBIR), as a popular studied branch of computer vision, attracts wide attention recently. Unlike sketch-based image retrieval (SBIR), the main aim of ZSSBIR is to retrieve natural images given free hand-drawn sketches that may not appear during training. Previous approaches used semantic aligned sketch-image pairs or utilized memory expensive fusion layer for projecting the visual information to a low dimensional subspace, which ignores the significant heterogeneous cross-domain discrepancy between highly abstract sketch and relevant image. This may yield poor performance in the training phase. To tackle this issue and overcome this drawback, we propose a Wasserstein distance based cross-modal semantic network (WAD-CMSN) for ZSSBIR. Specifically, it first projects the visual information of each branch (sketch, image) to a common low dimensional semantic subspace via Wasserstein distance in an adversarial training manner. Furthermore, identity matching loss is employed to select useful features, which can not only capture complete semantic knowledge, but also alleviate the over-fitting phenomenon caused by the WAD-CMSN model. Experimental results on the challenging Sketchy (Extended) and TU-Berlin (Extended) datasets indicate the effectiveness of the proposed WAD-CMSN model over several competitors.