Aisformer：带有变压器的Amodal实例分割

论文标题

Aisformer：带有变压器的Amodal实例分割

AISFormer: Amodal Instance Segmentation with Transformer

论文作者

Tran, Minh, Vo, Khoa, Yamazaki, Kashu, Fernandes, Arthur, Kidd, Michael, Le, Ngan

论文摘要

Amodal实例分割（AIS）旨在分割对象实例的可见和遮挡部分的区域。虽然基于掩码的R-CNN AIS方法显示出令人鼓舞的结果，但由于受体有限，它们无法建模高级特征相干性。最新的基于变压器的模型在视觉任务上表现出令人印象深刻的性能，甚至比卷积神经网络（CNN）更好。在这项工作中，我们介绍了Aisformer，一个AIS框架，并带有基于变压器的面具头。 Aisformer通过将它们视为可学习的查询，明确对封闭器，可见，阿莫达尔和看不见的掩码之间的复杂连贯性进行建模。具体而言，Aisformer包含四个模块：（i）特征编码：提取ROI并学习短程和远程视觉特征。 (ii) mask transformer decoding: generate the occluder, visible, and amodal mask query embeddings by a transformer decoder (iii) invisible mask embedding: model the coherence between the amodal and visible masks, and (iv) mask predicting: estimate output masks including occluder, visible, amodal and invisible.我们对三个具有挑战性的基准（即亲戚，D2SA和可可CLS）进行了广泛的实验和消融研究，以评估Aisformer的有效性。该代码可在以下网址找到：https：//github.com/uark-aicv/aisformer

Amodal Instance Segmentation (AIS) aims to segment the region of both visible and possible occluded parts of an object instance. While Mask R-CNN-based AIS approaches have shown promising results, they are unable to model high-level features coherence due to the limited receptive field. The most recent transformer-based models show impressive performance on vision tasks, even better than Convolution Neural Networks (CNN). In this work, we present AISFormer, an AIS framework, with a Transformer-based mask head. AISFormer explicitly models the complex coherence between occluder, visible, amodal, and invisible masks within an object's regions of interest by treating them as learnable queries. Specifically, AISFormer contains four modules: (i) feature encoding: extract ROI and learn both short-range and long-range visual features. (ii) mask transformer decoding: generate the occluder, visible, and amodal mask query embeddings by a transformer decoder (iii) invisible mask embedding: model the coherence between the amodal and visible masks, and (iv) mask predicting: estimate output masks including occluder, visible, amodal and invisible. We conduct extensive experiments and ablation studies on three challenging benchmarks i.e. KINS, D2SA, and COCOA-cls to evaluate the effectiveness of AISFormer. The code is available at: https://github.com/UARK-AICV/AISFormer

下载PDF全文

下载文献需遵守相关版权规定

论文标题