论文标题
低数据制度完全连接层的不合理效力
The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes
论文作者
论文摘要
直到最近,基于MLP的架构的变压器开始显示竞争性能,卷积神经网络一直是解决许多计算机视觉任务的标准。这些架构通常具有大量的权重,需要在大量数据集上进行培训。因此,它们不适合在低数据制度中使用。在这项工作中,我们提出了一个简单而有效的框架,以改善少量数据的概括。我们通过完全连接(FC)层增强现代CNN,并显示了这种建筑变化对低数据制度的巨大影响。我们进一步介绍了一种在线联合知识依据方法,可以在火车时使用额外的FC层,但在测试期间避免使用它们。这使我们能够改善基于CNN的模型的概括,而不会在测试时间增加重量的数量。我们对各种网络骨干和几个标准数据集执行分类实验,以进行监督学习和主动学习。我们的实验在没有完全连接的层的情况下显着优于网络,在监督设置中,相对提高了高达$ 16 \%$ $的验证精度,而无需在推理过程中添加任何额外的参数。
Convolutional neural networks were the standard for solving many computer vision tasks until recently, when Transformers of MLP-based architectures have started to show competitive performance. These architectures typically have a vast number of weights and need to be trained on massive datasets; hence, they are not suitable for their use in low-data regimes. In this work, we propose a simple yet effective framework to improve generalization from small amounts of data. We augment modern CNNs with fully-connected (FC) layers and show the massive impact this architectural change has in low-data regimes. We further present an online joint knowledge-distillation method to utilize the extra FC layers at train time but avoid them during test time. This allows us to improve the generalization of a CNN-based model without any increase in the number of weights at test time. We perform classification experiments for a large range of network backbones and several standard datasets on supervised learning and active learning. Our experiments significantly outperform the networks without fully-connected layers, reaching a relative improvement of up to $16\%$ validation accuracy in the supervised setting without adding any extra parameters during inference.