Swin Transformer耦合CNNS为VHR图像道路提取的强大上下文编码器

论文标题

Swin Transformer耦合CNNS为VHR图像道路提取的强大上下文编码器

Swin Transformer coupling CNNs Makes Strong Contextual Encoders for VHR Image Road Extraction

论文作者

Chen, Tao, Liu, Yiran, Jiang, Haoyu, Li, Ruirui

论文摘要

由于阶层内的大量变化，模糊的阶层区别和由阴影，树木和建筑物引起的遮挡，准确的分割道路具有挑战性。为了应对这些挑战，对重要质地细节和对全球几何上下文信息的感知的关注至关重要。最近的研究表明，仅使用CNN或Transformer的CNN转换杂种结构均优于表现。尽管CNN擅长提取本地细节功能，但变压器自然会感知全球上下文信息。在本文中，我们提出了一个名为CONSWIN的双分支网络块，该块结合了Resnet和SwintransFormers进行道路提取任务。这个conswin块利用了两种方法的优势，以更好地提取详细和全球特征。基于Conswin，我们构建了一个沙漏形的道路提取网络，并引入了两个新颖的连接结构，以更好地将纹理和结构细节信息传输到解码器。就整体准确性，IOU和F1指标而言，我们提出的方法在马萨诸塞州和CHN6-CUG数据集上都优于最先进的方法。其他实验验证了我们提出的模块的有效性，而可视化结果表明了其获得更好的道路表示的能力。

Accurately segmenting roads is challenging due to substantial intra-class variations, indistinct inter-class distinctions, and occlusions caused by shadows, trees, and buildings. To address these challenges, attention to important texture details and perception of global geometric contextual information are essential. Recent research has shown that CNN-Transformer hybrid structures outperform using CNN or Transformer alone. While CNN excels at extracting local detail features, the Transformer naturally perceives global contextual information. In this paper, we propose a dual-branch network block named ConSwin that combines ResNet and SwinTransformers for road extraction tasks. This ConSwin block harnesses the strengths of both approaches to better extract detailed and global features. Based on ConSwin, we construct an hourglass-shaped road extraction network and introduce two novel connection structures to better transmit texture and structural detail information to the decoder. Our proposed method outperforms state-of-the-art methods on both the Massachusetts and CHN6-CUG datasets in terms of overall accuracy, IOU, and F1 indicators. Additional experiments validate the effectiveness of our proposed module, while visualization results demonstrate its ability to obtain better road representations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题