论文标题
简单有效的语义细分架构
Simple and Efficient Architectures for Semantic Segmentation
论文作者
论文摘要
尽管诸如HRNET之类的语义细分的最新架构表现出令人印象深刻的准确性,但由于其显着设计选择而引起的复杂性阻碍了一系列模型加速工具,并且进一步利用了对当前硬件效率低下的操作。本文表明,具有类似于重的主链的简单编码器架构和一个小的多尺度头,比复杂的语义分割体系结构(例如hrnet,fovenet和ddrnets)表现出PAR或更好。由于这些骨干的有效接收场小得多,因此天真地将设计用于图像分类的深层骨架用于语义分割的任务会导致低于PAR的结果。在HRNET,DDRNET和FANET等作品中提出的各种设计选择中,隐含的是具有较大有效接收领域的网络。自然要问一个简单的编码器架构是否会比较有效的接收场,尽管不使用效率低下的操作(例如扩张的卷积)。我们表明,通过对重新结构进行较小和廉价的修改,可以为语义细分创建非常简单和竞争的基线。我们为台式机和移动目标提供了如此简单的体系结构的家族,这些家庭匹配或超过了CityScapes数据集中复杂模型的性能。我们希望我们的工作为从业者提供了简单而有效的基线,以开发有效的语义细分模型。
Though the state-of-the architectures for semantic segmentation, such as HRNet, demonstrate impressive accuracy, the complexity arising from their salient design choices hinders a range of model acceleration tools, and further they make use of operations that are inefficient on current hardware. This paper demonstrates that a simple encoder-decoder architecture with a ResNet-like backbone and a small multi-scale head, performs on-par or better than complex semantic segmentation architectures such as HRNet, FANet and DDRNets. Naively applying deep backbones designed for Image Classification to the task of Semantic Segmentation leads to sub-par results, owing to a much smaller effective receptive field of these backbones. Implicit among the various design choices put forth in works like HRNet, DDRNet, and FANet are networks with a large effective receptive field. It is natural to ask if a simple encoder-decoder architecture would compare favorably if comprised of backbones that have a larger effective receptive field, though without the use of inefficient operations like dilated convolutions. We show that with minor and inexpensive modifications to ResNets, enlarging the receptive field, very simple and competitive baselines can be created for Semantic Segmentation. We present a family of such simple architectures for desktop as well as mobile targets, which match or exceed the performance of complex models on the Cityscapes dataset. We hope that our work provides simple yet effective baselines for practitioners to develop efficient semantic segmentation models.