论文标题
用双阶段空间通道变压器进行粗到五个视频
Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer
论文作者
论文摘要
视频Denoising旨在从嘈杂的视频中恢复高质量的框架。尽管大多数现有方法采用卷积神经网络〜(CNN)将噪声与原始视觉内容分开,但是CNN专注于本地信息,而忽略了框架中远程区域之间的相互作用。此外,大多数相关作品在基本的时空转化后直接将输出作为最终结果,从而忽略了细颗粒的去核过程。在本文中,我们提出了一种双阶段空间通道变压器,用于粗到最新的视频denoising,该变压器继承了变压器和CNN的优势。具体而言,DSCT是基于渐进的双阶段结构提出的,即分别提取动态特征和静态特征的粗级和优质阶段。在两个阶段,空间通道编码模块均设计为在空间和通道级别上都建模远距离依赖关系。同时,我们设计了一个多尺度残差结构,以在不同阶段保留信息的多个方面,其中包含一个时间特征聚合模块以总结动态表示。与最新方法相比,四个公开数据集的广泛实验证明了我们提出的方法可取得重大改进。
Video denoising aims to recover high-quality frames from the noisy video. While most existing approaches adopt convolutional neural networks~(CNNs) to separate the noise from the original visual content, however, CNNs focus on local information and ignore the interactions between long-range regions in the frame. Furthermore, most related works directly take the output after basic spatio-temporal denoising as the final result, leading to neglect the fine-grained denoising process. In this paper, we propose a Dual-stage Spatial-Channel Transformer for coarse-to-fine video denoising, which inherits the advantages of both Transformer and CNNs. Specifically, DSCT is proposed based on a progressive dual-stage architecture, namely a coarse-level and a fine-level stage to extract dynamic features and static features, respectively. At both stages, a Spatial-Channel Encoding Module is designed to model the long-range contextual dependencies at both spatial and channel levels. Meanwhile, we design a Multi-Scale Residual Structure to preserve multiple aspects of information at different stages, which contains a Temporal Features Aggregation Module to summarize the dynamic representation. Extensive experiments on four publicly available datasets demonstrate our proposed method achieves significant improvements compared to the state-of-the-art methods.