MMFL网络：跨域时尚检索的多尺度和多晶格特征学习

论文标题

MMFL网络：跨域时尚检索的多尺度和多晶格特征学习

MMFL-Net: Multi-scale and Multi-granularity Feature Learning for Cross-domain Fashion Retrieval

论文作者

Bao, Chen, Zhang, Xudong, Chen, Jiazhou, Miao, Yongwei

论文摘要

实例级图像在时尚中的检索是一个充满挑战的问题，因为它在实际赛车视觉时尚搜索中的重要性越来越重要。跨域时尚检索旨在将无约束的客户图像匹配，以作为零售商提供的照片的查询；但是，由于广泛的消费者到商店（C2S）域差异，这是一项艰巨的任务，并且还考虑服装图像容易受到各种非刚性变形的影响。为此，我们提出了一个新颖的多尺度和多粒性特征学习网络（MMFL-net），该网络可以在统一的框架中共同学习服装图像的全球 - 本地聚合特征表示，旨在训练C2S时尚时尚视觉相似性的跨域模型。首先，通过应用自上而下和自下而上的双向多尺度功能融合，一种新的语义空间融合零件旨在弥合语义空间间隙。接下来，引入了一个多分支深层网络体系结构，以捕获全球显着，局部信息和本地详细信息，并通过整合了与多个粒度的粗到细胞嵌入的相似性学习来提取鲁棒和歧视功能。最后，我们的MMFL-NET采用了改进的Trihard损失，中心损失和多任务分类损失，该MMFL NET可以共同优化阶层内和类间距离，从而明确改善其视觉表现形式之间的类内部紧凑性和类间歧视性。此外，我们提出的模型还将多任务属性识别和分类模块与多标签语义属性和产品ID标签相结合。实验结果表明，我们提出的MMFL-NET对两个数据集（DeepFashion-C2S and Street2shop）上的最新方法实现了显着改善。

Instance-level image retrieval in fashion is a challenging issue owing to its increasing importance in real-scenario visual fashion search. Cross-domain fashion retrieval aims to match the unconstrained customer images as queries for photographs provided by retailers; however, it is a difficult task due to a wide range of consumer-to-shop (C2S) domain discrepancies and also considering that clothing image is vulnerable to various non-rigid deformations. To this end, we propose a novel multi-scale and multi-granularity feature learning network (MMFL-Net), which can jointly learn global-local aggregation feature representations of clothing images in a unified framework, aiming to train a cross-domain model for C2S fashion visual similarity. First, a new semantic-spatial feature fusion part is designed to bridge the semantic-spatial gap by applying top-down and bottom-up bidirectional multi-scale feature fusion. Next, a multi-branch deep network architecture is introduced to capture global salient, part-informed, and local detailed information, and extracting robust and discrimination feature embedding by integrating the similarity learning of coarse-to-fine embedding with the multiple granularities. Finally, the improved trihard loss, center loss, and multi-task classification loss are adopted for our MMFL-Net, which can jointly optimize intra-class and inter-class distance and thus explicitly improve intra-class compactness and inter-class discriminability between its visual representations for feature learning. Furthermore, our proposed model also combines the multi-task attribute recognition and classification module with multi-label semantic attributes and product ID labels. Experimental results demonstrate that our proposed MMFL-Net achieves significant improvement over the state-of-the-art methods on the two datasets, DeepFashion-C2S and Street2Shop.

下载PDF全文

下载文献需遵守相关版权规定

论文标题