论文标题
统一培训和泛型分割的推断
Unifying Training and Inference for Panoptic Segmentation
论文作者
论文摘要
我们提出了一个端到端网络,以弥合培训和推理管道之间的跨度差距,该途径旨在将图像划分为“物体”的语义区域以及“事物”的对象实例。与最近的作品相反,我们的网络利用了由端到端学习的密集实例亲和力驱动的参数性但轻量级的全景分段子模块,以捕获任何一对像素属于同一实例的概率。该盆腔模块产生了一种用于全景逻辑的新型繁殖机制,并使网络能够为“物质”和“事物”类输出连贯的全面分割图,而无需任何后处理。我们的完整系统收获了端到端培训的好处,在受欢迎的街道场景数据集(CityScapes)上设定了新的记录,仅使用精美的注释就可以使用Resnet-50骨干线实现61.4 PQ。在具有挑战性的可可数据集上,我们的基于Resnet-50的网络还提供43.4 PQ的最新精度。此外,我们的网络可以灵活地使用和不使用对象掩盖线索,在两个设置下都具有竞争力,这对于具有计算预算的应用程序都很感兴趣。
We present an end-to-end network to bridge the gap between training and inference pipeline for panoptic segmentation, a task that seeks to partition an image into semantic regions for "stuff" and object instances for "things". In contrast to recent works, our network exploits a parametrised, yet lightweight panoptic segmentation submodule, powered by an end-to-end learnt dense instance affinity, to capture the probability that any pair of pixels belong to the same instance. This panoptic submodule gives rise to a novel propagation mechanism for panoptic logits and enables the network to output a coherent panoptic segmentation map for both "stuff" and "thing" classes, without any post-processing. Reaping the benefits of end-to-end training, our full system sets new records on the popular street scene dataset, Cityscapes, achieving 61.4 PQ with a ResNet-50 backbone using only the fine annotations. On the challenging COCO dataset, our ResNet-50-based network also delivers state-of-the-art accuracy of 43.4 PQ. Moreover, our network flexibly works with and without object mask cues, performing competitively under both settings, which is of interest for applications with computation budgets.