Instaformer：带有变压器的实例感知图像到图像翻译

论文标题

Instaformer：带有变压器的实例感知图像到图像翻译

InstaFormer: Instance-Aware Image-to-Image Translation with Transformer

论文作者

Kim, Soohyun, Baek, Jongbeom, Park, Jihye, Kim, Gyeongnyeon, Kim, Seungryong

论文摘要

我们提出了一种基于变压器的新型网络体系结构，用于实例感知图像到图像的翻译，称为Instaformer，以有效整合全球和实例级别的信息。通过考虑从图像为代币中提取的内容功能，我们的网络通过通过变压器中的自我发项式模块来考虑上下文信息来发现内容特征的全球共识。通过使用从内容框信息中提取的实例级功能来增强该令牌，我们的框架能够学习对象实例和全局图像之间的交互，从而提高实例意识。我们用自适应实例归一化（ADAIN）替换标准变压器中的图层归一化（分层），以实现具有样式代码的多模式转换。此外，为了提高对象区域的实例意识和翻译质量，我们提出了实例级内容对比损失，在输入和翻译图像之间定义了对比损失。我们进行实验，以证明我们的Instaformer对最新方法的有效性并提供广泛的消融研究。

We present a novel Transformer-based network architecture for instance-aware image-to-image translation, dubbed InstaFormer, to effectively integrate global- and instance-level information. By considering extracted content features from an image as tokens, our networks discover global consensus of content features by considering context information through a self-attention module in Transformers. By augmenting such tokens with an instance-level feature extracted from the content feature with respect to bounding box information, our framework is capable of learning an interaction between object instances and the global image, thus boosting the instance-awareness. We replace layer normalization (LayerNorm) in standard Transformers with adaptive instance normalization (AdaIN) to enable a multi-modal translation with style codes. In addition, to improve the instance-awareness and translation quality at object regions, we present an instance-level content contrastive loss defined between input and translated image. We conduct experiments to demonstrate the effectiveness of our InstaFormer over the latest methods and provide extensive ablation studies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题