在存在嘈杂标签的情况下，有关数据增强对培训卷积神经网络的影响的研究

论文标题

在存在嘈杂标签的情况下，有关数据增强对培训卷积神经网络的影响的研究

A Study on the Impact of Data Augmentation for Training Convolutional Neural Networks in the Presence of Noisy Labels

论文作者

Santana, Emeson, Carneiro, Gustavo, Cordeiro, Filipe R.

论文摘要

标签噪声在大型现实世界数据集中很常见，其存在会损害深层神经网络的训练过程。尽管几项工作集中在解决此问题的培训策略上，但很少有研究评估数据增强作为培训深神经网络的设计选择。在这项工作中，我们在使用不同的数据增强量时分析了模型的鲁棒性及其在存在噪声标签的情况下对培训的改进。我们评估了数据集MNIST，CIFAR-10，CIFAR-100和现实世界中数据集Clothing1M的最新和经典数据增强策略，具有不同级别的合成噪声。我们使用准确度度量评估方法。结果表明，与基线相比，适当的数据增强可以大大提高模型的鲁棒性，以提高标签噪声的稳健性，相比之下，相对最佳测试准确性的177.84％，而没有增强效果，并且随着最新的dividemix训练策略，可实现的绝对值增加了6％。

Label noise is common in large real-world datasets, and its presence harms the training process of deep neural networks. Although several works have focused on the training strategies to address this problem, there are few studies that evaluate the impact of data augmentation as a design choice for training deep neural networks. In this work, we analyse the model robustness when using different data augmentations and their improvement on the training with the presence of noisy labels. We evaluate state-of-the-art and classical data augmentation strategies with different levels of synthetic noise for the datasets MNist, CIFAR-10, CIFAR-100, and the real-world dataset Clothing1M. We evaluate the methods using the accuracy metric. Results show that the appropriate selection of data augmentation can drastically improve the model robustness to label noise, increasing up to 177.84% of relative best test accuracy compared to the baseline with no augmentation, and an increase of up to 6% in absolute value with the state-of-the-art DivideMix training strategy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题