论文标题
预训练的图像处理变压器
Pre-Trained Image Processing Transformer
论文作者
论文摘要
随着现代硬件的计算能力正在强烈提高,预先训练的深度学习模型(例如,BERT,GPT-3)在大规模数据集上学到的,对传统方法显示了它们的有效性。最大的进步主要是有助于变压器及其变体体系结构的表示能力。在本文中,我们研究了低水平的计算机视觉任务(例如,去核,超分辨率和降低),并开发了一个新的预训练模型,即图像处理变压器(IPT)。为了最大程度地挖掘变压器的能力,我们提出利用众所周知的Imagenet基准来产生大量损坏的图像对。 IPT模型对这些图像进行了多头和多尾式培训。此外,引入了对比度学习,以很好地适应不同的图像处理任务。因此,预训练的模型可以在微调后有效地在所需的任务上使用。 IPT只有一个预训练的模型,优于各种低级基准测试的当前最新方法。代码可在https://github.com/huawei-noah/pretratained-ipt和https://gitee.com/mindspore/mindspore/mindspore/tree/master/master/model_zoo/research/cv/ipt获得。
As the computing power of modern hardware is increasing strongly, pre-trained deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have shown their effectiveness over conventional methods. The big progress is mainly contributed to the representation ability of transformer and its variant architectures. In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks. Code is available at https://github.com/huawei-noah/Pretrained-IPT and https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/IPT