MCIBI ++：柔软的采矿上下文信息超出语义分割图像

论文标题

MCIBI ++：柔软的采矿上下文信息超出语义分割图像

MCIBI++: Soft Mining Contextual Information Beyond Image for Semantic Segmentation

论文作者

Jin, Zhenchao, Yu, Dongdong, Yuan, Zehuan, Yu, Lequan

论文摘要

共同出现的视觉模式使上下文聚集成为语义分割的重要范式。现有的研究着重于对图像中的上下文进行建模，同时忽略图像之外的相应类别的有价值的语义。为此，我们提出了一个新颖的软采矿上下文信息，超出了名为McIbi ++的图像范式，以进一步提高像素级表示。具体来说，我们首先设置了动态更新的内存模块，以存储各种类别的数据集级别分布信息，然后利用信息在网络向前期间产生数据集级别类别表示。之后，我们为每个像素表示形式生成一个类概率分布，并以类概率分布作为权重进行数据集级上下文聚合。最后，使用汇总的数据集级别和传统的图像级上下文信息来增强原始像素表示。此外，在推论阶段，我们还设计了一种粗到精细的迭代推理策略，以进一步提高分割结果。 MCIBI ++可以轻松地纳入现有的分割框架中，并带来一致的性能改进。同样，MCIBI ++可以扩展到视频语义分割框架中，比基线进行了大量改进。配备了MCIBI ++，我们在七个具有挑战性的图像或视频语义分段基准测试中实现了最先进的性能。

Co-occurrent visual pattern makes context aggregation become an essential paradigm for semantic segmentation.The existing studies focus on modeling the contexts within image while neglecting the valuable semantics of the corresponding category beyond image. To this end, we propose a novel soft mining contextual information beyond image paradigm named MCIBI++ to further boost the pixel-level representations. Specifically, we first set up a dynamically updated memory module to store the dataset-level distribution information of various categories and then leverage the information to yield the dataset-level category representations during network forward. After that, we generate a class probability distribution for each pixel representation and conduct the dataset-level context aggregation with the class probability distribution as weights. Finally, the original pixel representations are augmented with the aggregated dataset-level and the conventional image-level contextual information. Moreover, in the inference phase, we additionally design a coarse-to-fine iterative inference strategy to further boost the segmentation results. MCIBI++ can be effortlessly incorporated into the existing segmentation frameworks and bring consistent performance improvements. Also, MCIBI++ can be extended into the video semantic segmentation framework with considerable improvements over the baseline. Equipped with MCIBI++, we achieved the state-of-the-art performance on seven challenging image or video semantic segmentation benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题