低数据制度完全连接层的不合理效力

论文标题

低数据制度完全连接层的不合理效力

The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

论文作者

Kocsis, Peter, Súkeník, Peter, Brasó, Guillem, Nießner, Matthias, Leal-Taixé, Laura, Elezi, Ismail

论文摘要

直到最近，基于MLP的架构的变压器开始显示竞争性能，卷积神经网络一直是解决许多计算机视觉任务的标准。这些架构通常具有大量的权重，需要在大量数据集上进行培训。因此，它们不适合在低数据制度中使用。在这项工作中，我们提出了一个简单而有效的框架，以改善少量数据的概括。我们通过完全连接（FC）层增强现代CNN，并显示了这种建筑变化对低数据制度的巨大影响。我们进一步介绍了一种在线联合知识依据方法，可以在火车时使用额外的FC层，但在测试期间避免使用它们。这使我们能够改善基于CNN的模型的概括，而不会在测试时间增加重量的数量。我们对各种网络骨干和几个标准数据集执行分类实验，以进行监督学习和主动学习。我们的实验在没有完全连接的层的情况下显着优于网络，在监督设置中，相对提高了高达$ 16 \％$ $的验证精度，而无需在推理过程中添加任何额外的参数。

Convolutional neural networks were the standard for solving many computer vision tasks until recently, when Transformers of MLP-based architectures have started to show competitive performance. These architectures typically have a vast number of weights and need to be trained on massive datasets; hence, they are not suitable for their use in low-data regimes. In this work, we propose a simple yet effective framework to improve generalization from small amounts of data. We augment modern CNNs with fully-connected (FC) layers and show the massive impact this architectural change has in low-data regimes. We further present an online joint knowledge-distillation method to utilize the extra FC layers at train time but avoid them during test time. This allows us to improve the generalization of a CNN-based model without any increase in the number of weights at test time. We perform classification experiments for a large range of network backbones and several standard datasets on supervised learning and active learning. Our experiments significantly outperform the networks without fully-connected layers, reaching a relative improvement of up to $16\%$ validation accuracy in the supervised setting without adding any extra parameters during inference.

下载PDF全文

下载文献需遵守相关版权规定

论文标题