DMTNET：双像素图像的动态多尺度网络与变压器Deblurring Deflurring

论文标题

DMTNET：双像素图像的动态多尺度网络与变压器Deblurring Deflurring

DMTNet: Dynamic Multi-scale Network for Dual-pixel Images Defocus Deblurring with Transformer

论文作者

Zhang, Dafeng, Wang, Xiaobing

论文摘要

最近的工作在使用卷积神经网络（CNN）的双像素数据基于defocus Deblurring任务方面取得了出色的成果，而数据的稀缺性限制了视觉变压器在此任务中的探索和尝试。此外，现有作品还使用固定参数和网络体系结构来删除具有不同分布和内容信息的图像，这也影响了模型的概括能力。在本文中，我们为双像素图像defocus deblurring提出了一个名为DMTNET的动态多尺度网络。 DMTNET主要包含两个模块：特征提取模块和重建模块。该特征提取模块由几个视觉变压器块组成，该模块使用其强大的特征提取能力来获得更丰富的功能并提高模型的鲁棒性。重建模块由几个动态多尺度子重构模块（DMSSRM）组成。 DMSSRM可以根据输入图像的模糊分布和内容信息自适应地分配权重来恢复图像。 DMTNET结合了变压器和CNN的优势，其中视觉变压器改善了CNN的性能上限，而CNN的电感偏置使变压器能够在不依赖大量数据的情况下提取更强大的功能。 DMTNET可能是首次使用Vision Transformer来恢复模糊的图像以清晰的尝试。通过与CNN结合，视觉变压器可以在小数据集上实现更好的性能。对流行基准测试的实验结果表明，我们的DMTNET明显优于最先进的方法。

Recent works achieve excellent results in defocus deblurring task based on dual-pixel data using convolutional neural network (CNN), while the scarcity of data limits the exploration and attempt of vision transformer in this task. In addition, the existing works use fixed parameters and network architecture to deblur images with different distribution and content information, which also affects the generalization ability of the model. In this paper, we propose a dynamic multi-scale network, named DMTNet, for dual-pixel images defocus deblurring. DMTNet mainly contains two modules: feature extraction module and reconstruction module. The feature extraction module is composed of several vision transformer blocks, which uses its powerful feature extraction capability to obtain richer features and improve the robustness of the model. The reconstruction module is composed of several Dynamic Multi-scale Sub-reconstruction Module (DMSSRM). DMSSRM can restore images by adaptively assigning weights to features from different scales according to the blur distribution and content information of the input images. DMTNet combines the advantages of transformer and CNN, in which the vision transformer improves the performance ceiling of CNN, and the inductive bias of CNN enables transformer to extract more robust features without relying on a large amount of data. DMTNet might be the first attempt to use vision transformer to restore the blurring images to clarity. By combining with CNN, the vision transformer may achieve better performance on small datasets. Experimental results on the popular benchmarks demonstrate that our DMTNet significantly outperforms state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题