论文标题
哑光形式:基于变压器的图像通过先进的图像垫子
MatteFormer: Transformer-Based Image Matting via Prior-Tokens
论文作者
论文摘要
在本文中,我们提出了一个名为Matteformer的基于变压器的图像矩阵模型,该模型可以充分利用变压器块中的Trimap信息。我们的方法首先引入了一个先前的token,它是每个三曲板区域的全局表示(例如前景,背景和未知)。这些先前的训练被用作全球先验,并参与每个区块的自我注意机制。编码器的每个阶段都是由过去(先前的弹力弹力变压器)块组成的,该块基于Swin Transformer块,但在几个方面有所不同:1)它具有PA-WSA(先前的窗户自我关注)层,不仅对空间折磨,而且对先前的Tokens进行自我关注。 2)它具有先前的记忆,可以从先前的块中积累先前的tokens,并将其传输到下一个块。我们在常用的图像垫数据集上评估了哑光形式:组成-1K和Intertionions-646。实验结果表明,我们提出的方法以很大的边距实现了最先进的性能。我们的代码可从https://github.com/webtoon/matteformer获得。
In this paper, we propose a transformer-based image matting model called MatteFormer, which takes full advantage of trimap information in the transformer block. Our method first introduces a prior-token which is a global representation of each trimap region (e.g. foreground, background and unknown). These prior-tokens are used as global priors and participate in the self-attention mechanism of each block. Each stage of the encoder is composed of PAST (Prior-Attentive Swin Transformer) block, which is based on the Swin Transformer block, but differs in a couple of aspects: 1) It has PA-WSA (Prior-Attentive Window Self-Attention) layer, performing self-attention not only with spatial-tokens but also with prior-tokens. 2) It has prior-memory which saves prior-tokens accumulatively from the previous blocks and transfers them to the next block. We evaluate our MatteFormer on the commonly used image matting datasets: Composition-1k and Distinctions-646. Experiment results show that our proposed method achieves state-of-the-art performance with a large margin. Our codes are available at https://github.com/webtoon/matteformer.