一种量化深网中外围性能降解的经验方法

论文标题

一种量化深网中外围性能降解的经验方法

An Empirical Method to Quantify the Peripheral Performance Degradation in Deep Networks

论文作者

Wloka, Calden, Tsotsos, John K.

论文摘要

将卷积内核应用于图像时，如果输出保持与输入相同的大小，则在图像边界周围需要某种形式的填充物，这意味着，对于卷积神经网络（CNN）中的每一层卷积，一层像素条等于与非veriDical代表的一半宽度相当于核大小的半宽度。尽管大多数CNN内核都很小，可以减少网络的参数负载，但该非佛经区域具有每个卷积层的化合物。越来越深的网络与基于步幅的下采样相结合的趋势意味着，该区域的传播最终可能涵盖图像的不可分割的一部分。尽管多年来的卷积问题已经得到了充分的认可，但这种降级的外围代表对现代网络行为的影响尚未得到充分量化。翻译不变性的限制是什么？图像填充是否成功减轻了问题，还是随着对象在图像边框和中心之间移动的影响？使用蒙版R-CNN作为实验模型，我们设计了一个数据集和方法来量化网络性能的空间依赖性。我们的数据集是通过将对象插入高分辨率背景的，从而使我们能够裁剪将目标对象放置在相对于图像边框的特定位置的子图像。通过探测蒙版R-CNN在目标位置选择的行为，我们可以看到图像边界附近，尤其是在图像角附近的性能降低模式。量化该空间各向异性在网络性能中的程度和幅度对于将深网络部署到不受约束和现实的环境中很重要，在这种环境中，在给定图像中，不保证对象或感兴趣的区域的位置或感兴趣的区域的位置。

When applying a convolutional kernel to an image, if the output is to remain the same size as the input then some form of padding is required around the image boundary, meaning that for each layer of convolution in a convolutional neural network (CNN), a strip of pixels equal to the half-width of the kernel size is produced with a non-veridical representation. Although most CNN kernels are small to reduce the parameter load of a network, this non-veridical area compounds with each convolutional layer. The tendency toward deeper and deeper networks combined with stride-based down-sampling means that the propagation of this region can end up covering a non-negligable portion of the image. Although this issue with convolutions has been well acknowledged over the years, the impact of this degraded peripheral representation on modern network behavior has not been fully quantified. What are the limits of translation invariance? Does image padding successfully mitigate the issue, or is performance affected as an object moves between the image border and center? Using Mask R-CNN as an experimental model, we design a dataset and methodology to quantify the spatial dependency of network performance. Our dataset is constructed by inserting objects into high resolution backgrounds, thereby allowing us to crop sub-images which place target objects at specific locations relative to the image border. By probing the behaviour of Mask R-CNN across a selection of target locations, we see clear patterns of performance degredation near the image boundary, and in particular in the image corners. Quantifying both the extent and magnitude of this spatial anisotropy in network performance is important for the deployment of deep networks into unconstrained and realistic environments in which the location of objects or regions of interest are not guaranteed to be well localized within a given image.

下载PDF全文

下载文献需遵守相关版权规定

论文标题