论文标题
具有上下文感知变压器的无幽灵高动态范围成像
Ghost-free High Dynamic Range Imaging with Context-aware Transformer
论文作者
论文摘要
高动态范围(HDR)DEGHOSTING算法旨在生成具有现实细节的无幽灵HDR图像。受到接收场的位置的限制,现有的基于CNN的方法通常容易在存在较大的运动和严重饱和的情况下产生鬼影和强度扭曲。在本文中,我们提出了一种新颖的环境感知视觉变压器(CA-VIT),用于无幽灵的高动态范围成像。 CA-VIT被设计为双分支结构,可以共同捕获全球和本地依赖性。具体而言,全球分支采用基于窗口的变压器编码器来建模远程对象运动和强度变化以解决hosting。对于本地分支,我们设计了局部上下文提取器(LCE)来捕获短范围的图像特征,并使用频道注意机制在提取的特征上选择信息丰富的本地详细信息,以补充全局分支。通过将CA-Vit纳入基本组件,我们进一步构建了HDR-Transformer,这是一个分层网络,可重建高质量的无鬼HDR图像。在三个基准数据集上进行的广泛实验表明,我们的方法在定性和定量上优于最先进的方法,而计算预算大大降低。代码可从https://github.com/megvii-research/hdr-transformer获得
High dynamic range (HDR) deghosting algorithms aim to generate ghost-free HDR images with realistic details. Restricted by the locality of the receptive field, existing CNN-based methods are typically prone to producing ghosting artifacts and intensity distortions in the presence of large motion and severe saturation. In this paper, we propose a novel Context-Aware Vision Transformer (CA-ViT) for ghost-free high dynamic range imaging. The CA-ViT is designed as a dual-branch architecture, which can jointly capture both global and local dependencies. Specifically, the global branch employs a window-based Transformer encoder to model long-range object movements and intensity variations to solve ghosting. For the local branch, we design a local context extractor (LCE) to capture short-range image features and use the channel attention mechanism to select informative local details across the extracted features to complement the global branch. By incorporating the CA-ViT as basic components, we further build the HDR-Transformer, a hierarchical network to reconstruct high-quality ghost-free HDR images. Extensive experiments on three benchmark datasets show that our approach outperforms state-of-the-art methods qualitatively and quantitatively with considerably reduced computational budgets. Codes are available at https://github.com/megvii-research/HDR-Transformer