我们可以从错误分类的Imagenet图像中学到什么？

论文标题

我们可以从错误分类的Imagenet图像中学到什么？

What can we learn from misclassified ImageNet images?

论文作者

Wen, Shixian, Rios, Amanda Sofie, Lekkala, Kiran, Itti, Laurent

论文摘要

了解错误分类的成像网图像的模式尤为重要，因为它可以指导我们设计更概括的深神经网络（DNN）。但是，ImageNet的丰富性在视觉上难以在视觉上找到任何有用的错误分类模式。在这里，为了帮助找到这些模式，我们提出了“超级分类Imagenet数据集”。它是ImageNet的一个子集，该子集由10个超类组成，每个类别包含7-116个相关子类（例如52种鸟类类型，116种狗类型）。通过在此数据集上训练神经网络，我们发现：（i）跨类别的错误分类很少，但主要是在超级类中的子类别中。（ii）仅在给定超类的子类上训练的集合网络比在所有超级类的所有子类方面训练的网络都表现更好。因此，我们提出了一个两阶段的超级式框架，并证明了：（i）首先使用通用超级类级网络推断超级类，然后使用专门的网络用于最终的子类级分类，从而将整体分类性能提高了3.3％。（ii）尽管与使用单个网络相比，n超类的总参数存储成本增加到n+1因子n+1，但具有列表，三角洲和量化意识培训技术可以将其降低到0.2N+1。这种有效实现的另一个优点是，推断期间GPU的内存成本相当于仅使用一个网络。原因是我们通过在超级类级网络中添加小参数变化（Deltas）来启动每个子类级网络。（iii）最后，我们的框架有望比简单地扩展大小的普通替代方案更可扩展和推广，因为非常大的网络通常会遭受过度拟合和梯度消失。

Understanding the patterns of misclassified ImageNet images is particularly important, as it could guide us to design deep neural networks (DNN) that generalize better. However, the richness of ImageNet imposes difficulties for researchers to visually find any useful patterns of misclassification. Here, to help find these patterns, we propose "Superclassing ImageNet dataset". It is a subset of ImageNet which consists of 10 superclasses, each containing 7-116 related subclasses (e.g., 52 bird types, 116 dog types). By training neural networks on this dataset, we found that: (i) Misclassifications are rarely across superclasses, but mainly among subclasses within a superclass. (ii) Ensemble networks trained each only on subclasses of a given superclass perform better than the same network trained on all subclasses of all superclasses. Hence, we propose a two-stage Super-Sub framework, and demonstrate that: (i) The framework improves overall classification performance by 3.3%, by first inferring a superclass using a generalist superclass-level network, and then using a specialized network for final subclass-level classification. (ii) Although the total parameter storage cost increases to a factor N+1 for N superclasses compared to using a single network, with finetuning, delta and quantization aware training techniques this can be reduced to 0.2N+1. Another advantage of this efficient implementation is that the memory cost on the GPU during inference is equivalent to using only one network. The reason is we initiate each subclass-level network through addition of small parameter variations (deltas) to the superclass-level network. (iii) Finally, our framework promises to be more scalable and generalizable than the common alternative of simply scaling up a vanilla network in size, since very large networks often suffer from overfitting and gradient vanishing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题