通过语义数据扩展将深层网络正规化

论文标题

通过语义数据扩展将深层网络正规化

Regularizing Deep Networks with Semantic Data Augmentation

论文作者

Wang, Yulin, Huang, Gao, Song, Shiji, Pan, Xuran, Xia, Yitong, Wu, Cheng

论文摘要

数据增强被广泛称为一种简单而令人惊讶的有效技术，用于使深网络正规化。传统的数据增强方案，例如翻转，翻译或旋转，是低级，与数据无关和类不足的操作，导致增强样品的多样性有限。为此，我们提出了一种新型的语义数据增强算法来补充传统方法。所提出的方法的灵感来自有趣的属性，即深网络有效地学习线性化特征，即深度特征空间中的某些方向对应于有意义的语义转换，例如改变对象的背景或视图角度。基于此观察，将训练样本转换为特征空间中的许多这样的方向可以有效地增强数据集以获得更多的多样性。为了实现这一想法，我们首先引入了一种基于抽样的方法，以有效地获得具有语义有意义的方向。然后，通过假设增强样品的数量流向无穷大，得出了增强训练集的预期跨侧拷贝（CE）损失的上限，从而产生高效的算法。实际上，我们表明，提出的隐性语义数据增强（ISDA）算法量量最小化了新型的健壮CE损失，这为正常培训程序增加了最小的额外计算成本。除了有监督的学习外，ISDA还可以应用于一致性正则化框架下的半监督学习任务，其中ISDA等于最小化增强功能与原始功能之间预期的KL差异的上限。尽管很简单，但ISDA始终在各种数据集上（即CIFAR-10，CIFAR-100，SVHN，SVHN，Imagenet和CityScapes）提高流行深层模型（例如重新NET和DENSENETS）的概括性能。

Data augmentation is widely known as a simple yet surprisingly effective technique for regularizing deep networks. Conventional data augmentation schemes, e.g., flipping, translation or rotation, are low-level, data-independent and class-agnostic operations, leading to limited diversity for augmented samples. To this end, we propose a novel semantic data augmentation algorithm to complement traditional approaches. The proposed method is inspired by the intriguing property that deep networks are effective in learning linearized features, i.e., certain directions in the deep feature space correspond to meaningful semantic transformations, e.g., changing the background or view angle of an object. Based on this observation, translating training samples along many such directions in the feature space can effectively augment the dataset for more diversity. To implement this idea, we first introduce a sampling based method to obtain semantically meaningful directions efficiently. Then, an upper bound of the expected cross-entropy (CE) loss on the augmented training set is derived by assuming the number of augmented samples goes to infinity, yielding a highly efficient algorithm. In fact, we show that the proposed implicit semantic data augmentation (ISDA) algorithm amounts to minimizing a novel robust CE loss, which adds minimal extra computational cost to a normal training procedure. In addition to supervised learning, ISDA can be applied to semi-supervised learning tasks under the consistency regularization framework, where ISDA amounts to minimizing the upper bound of the expected KL-divergence between the augmented features and the original features. Although being simple, ISDA consistently improves the generalization performance of popular deep models (e.g., ResNets and DenseNets) on a variety of datasets, i.e., CIFAR-10, CIFAR-100, SVHN, ImageNet, and Cityscapes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题