论文标题
通过数字绘画增加语义分割模型的鲁棒性
Increasing the Robustness of Semantic Segmentation Models with Painting-by-Numbers
论文作者
论文摘要
对于诸如自动驾驶的安全至关重要的应用,CNN必须在不可避免的图像损坏(例如图像噪声)方面坚固耐用。尽管以前的工作解决了在全图分类的上下文中稳健预测的任务,但我们认为它是用于密集的语义分割的。我们基于图像分类的洞察力,即可以通过增加对象形状的网络偏置来提高输出鲁棒性。我们提出了一种新的培训模式,以增加这种形状偏见。我们的基本思想是用伪造的图像alpha-blend RGB训练图像的一部分,在该图像中,每个类标签都有固定的,随机选择的颜色,不太可能出现在真实的图像中。这迫使网络更强烈地依靠形状提示。我们称此数据增强技术为``绘画''。我们通过各种网络主机,Mobilenet-V2,Resnet和Xception证明了培训模式对DeepLabV3+的有效性,并在CityScapes数据集上进行了评估。关于我们的16种不同类型的图像损坏和5种不同的网络骨干,我们比使用干净的数据培训要好74%。对于我们比没有培训模式训练的模型要糟糕的情况,大多数情况下的情况大多要差。但是,对于某些图像损坏,例如带有噪音的图像,我们看到高达25%的性能增长。
For safety-critical applications such as autonomous driving, CNNs have to be robust with respect to unavoidable image corruptions, such as image noise. While previous works addressed the task of robust prediction in the context of full-image classification, we consider it for dense semantic segmentation. We build upon an insight from image classification that output robustness can be improved by increasing the network-bias towards object shapes. We present a new training schema that increases this shape bias. Our basic idea is to alpha-blend a portion of the RGB training images with faked images, where each class-label is given a fixed, randomly chosen color that is not likely to appear in real imagery. This forces the network to rely more strongly on shape cues. We call this data augmentation technique ``Painting-by-Numbers''. We demonstrate the effectiveness of our training schema for DeepLabv3+ with various network backbones, MobileNet-V2, ResNets, and Xception, and evaluate it on the Cityscapes dataset. With respect to our 16 different types of image corruptions and 5 different network backbones, we are in 74% better than training with clean data. For cases where we are worse than a model trained without our training schema, it is mostly only marginally worse. However, for some image corruptions such as images with noise, we see a considerable performance gain of up to 25%.