视频息肉细分：深度学习观点

论文标题

视频息肉细分：深度学习观点

Video Polyp Segmentation: A Deep Learning Perspective

论文作者

Ji, Ge-Peng, Xiao, Guobao, Chou, Yu-Cheng, Fan, Deng-Ping, Zhao, Kai, Chen, Geng, Van Gool, Luc

论文摘要

我们介绍了深度学习时代的第一个全面的视频息肉细分（VPS）研究。多年来，由于缺乏大规模细粒分段注释，VPS的发展并没有轻松前进。为了解决这个问题，我们首先引入了名为Sun-Seg的高质量逐帧注释数据集，其中包含来自著名的Sun-Database的158,690个结肠镜检查帧。我们提供具有不同类型的其他注释，即属性，对象掩码，边界，涂鸦和多边形。其次，我们设计了一个简单但有效的基线，称为PNS+，由全局编码器，局部编码器和归一化的自我注意（NS）块组成。全球和本地编码器会收到一个锚固框架和多个连续的帧，以提取长期和短期时空表示形式，然后通过两个NS块逐渐更新。广泛的实验表明，PNS+实现了最佳性能和实时推理速度（170fps），这使其成为VPS任务的有前途解决方案。第三，我们在Sun-Seg数据集中广泛评估13个代表性息肉/对象分割模型，并提供基于属性的比较。最后，我们讨论了几个开放问题，并为VPS社区提出了可能的研究指示。

We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era. Over the years, developments in VPS are not moving forward with ease due to the lack of large-scale fine-grained segmentation annotations. To address this issue, we first introduce a high-quality frame-by-frame annotated VPS dataset, named SUN-SEG, which contains 158,690 colonoscopy frames from the well-known SUN-database. We provide additional annotations with diverse types, i.e., attribute, object mask, boundary, scribble, and polygon. Second, we design a simple but efficient baseline, dubbed PNS+, consisting of a global encoder, a local encoder, and normalized self-attention (NS) blocks. The global and local encoders receive an anchor frame and multiple successive frames to extract long-term and short-term spatial-temporal representations, which are then progressively updated by two NS blocks. Extensive experiments show that PNS+ achieves the best performance and real-time inference speed (170fps), making it a promising solution for the VPS task. Third, we extensively evaluate 13 representative polyp/object segmentation models on our SUN-SEG dataset and provide attribute-based comparisons. Finally, we discuss several open issues and suggest possible research directions for the VPS community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题