平滑事物：域自适应语义分割的动量变压器

论文标题

平滑事物：域自适应语义分割的动量变压器

Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic Segmentation

论文作者

Chen, Runfa, Rong, Yu, Guo, Shangmin, Han, Jiaqi, Sun, Fuchun, Xu, Tingyang, Huang, Wenbing

论文摘要

在视觉变压器变体（VIT）在计算机视觉中取得了巨大成功之后，它在域自适应语义分段中也表现出巨大的潜力。不幸的是，在域自适应语义细分中直接应用局部VIT并不会带来预期的改善。我们发现，局部VIT的陷阱是由于伪标签结构期间产生的严重高频组件以及目标域的特征对齐。这些高频组件使对当地VIT的培训非常不平衡，并损害了其可转移性。在本文中，我们引入了低通滤波机制，动量网络，以平滑目标域特征和伪标签的学习动力学。此外，我们提出了一种差异测量的动态，以通过动态权重来对齐源和目标域中的分布，以评估样品的重要性。解决上述问题后，对SIM2REAL基准测试的广泛实验表明，所提出的方法的表现优于最新方法。我们的代码可在https://github.com/alpc91/transda上找到

After the great success of Vision Transformer variants (ViTs) in computer vision, it has also demonstrated great potential in domain adaptive semantic segmentation. Unfortunately, straightforwardly applying local ViTs in domain adaptive semantic segmentation does not bring in expected improvement. We find that the pitfall of local ViTs is due to the severe high-frequency components generated during both the pseudo-label construction and features alignment for target domains. These high-frequency components make the training of local ViTs very unsmooth and hurt their transferability. In this paper, we introduce a low-pass filtering mechanism, momentum network, to smooth the learning dynamics of target domain features and pseudo labels. Furthermore, we propose a dynamic of discrepancy measurement to align the distributions in the source and target domains via dynamic weights to evaluate the importance of the samples. After tackling the above issues, extensive experiments on sim2real benchmarks show that the proposed method outperforms the state-of-the-art methods. Our codes are available at https://github.com/alpc91/TransDA

下载PDF全文

下载文献需遵守相关版权规定

论文标题