哑光形式：基于变压器的图像通过先进的图像垫子

论文标题

哑光形式：基于变压器的图像通过先进的图像垫子

MatteFormer: Transformer-Based Image Matting via Prior-Tokens

论文作者

Park, GyuTae, Son, SungJoon, Yoo, JaeYoung, Kim, SeHo, Kwak, Nojun

论文摘要

在本文中，我们提出了一个名为Matteformer的基于变压器的图像矩阵模型，该模型可以充分利用变压器块中的Trimap信息。我们的方法首先引入了一个先前的token，它是每个三曲板区域的全局表示（例如前景，背景和未知）。这些先前的训练被用作全球先验，并参与每个区块的自我注意机制。编码器的每个阶段都是由过去（先前的弹力弹力变压器）块组成的，该块基于Swin Transformer块，但在几个方面有所不同：1）它具有PA-WSA（先前的窗户自我关注）层，不仅对空间折磨，而且对先前的Tokens进行自我关注。 2）它具有先前的记忆，可以从先前的块中积累先前的tokens，并将其传输到下一个块。我们在常用的图像垫数据集上评估了哑光形式：组成-1K和Intertionions-646。实验结果表明，我们提出的方法以很大的边距实现了最先进的性能。我们的代码可从https://github.com/webtoon/matteformer获得。

In this paper, we propose a transformer-based image matting model called MatteFormer, which takes full advantage of trimap information in the transformer block. Our method first introduces a prior-token which is a global representation of each trimap region (e.g. foreground, background and unknown). These prior-tokens are used as global priors and participate in the self-attention mechanism of each block. Each stage of the encoder is composed of PAST (Prior-Attentive Swin Transformer) block, which is based on the Swin Transformer block, but differs in a couple of aspects: 1) It has PA-WSA (Prior-Attentive Window Self-Attention) layer, performing self-attention not only with spatial-tokens but also with prior-tokens. 2) It has prior-memory which saves prior-tokens accumulatively from the previous blocks and transfers them to the next block. We evaluate our MatteFormer on the commonly used image matting datasets: Composition-1k and Distinctions-646. Experiment results show that our proposed method achieves state-of-the-art performance with a large margin. Our codes are available at https://github.com/webtoon/matteformer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题