Sapaugment：学习数据增强的样本自适应政策

论文标题

Sapaugment：学习数据增强的样本自适应政策

SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

论文作者

Hu, Ting-Yao, Shrivastava, Ashish, Chang, Jen-Hao Rick, Koppula, Hema, Braun, Stefan, Hwang, Kyuyeon, Kalinli, Ozlem, Tuzel, Oncel

论文摘要

数据增强方法通常将相同的增强（或将其混合）应用于所有训练样本。例如，为了用噪声驱动数据，所有样品的噪声是从具有固定标准偏差的正态分布中采样的。我们假设一个具有高训练损失的硬样本已经提供了强大的训练信号以更新模型参数，应以轻度或无增强的扰动。用强大的增强来扰动硬样品也可能使得很难学习。此外，训练损失较低的样本应受到更强的增强性的干扰，以便为各种条件提供更强的鲁棒性。为了使这些直觉形式化，我们提出了一种新的方法，以学习增强样品自适应政策 - 萨普格门特。我们的政策根据数据样本的培训损失来适应增强参数。在高斯噪声的示例中，硬样品将以低方差噪声和较高方差噪声的简单样本扰动。此外，提出的方法将多种增强方法结合到有条理的政策学习框架中，并通过反复试验来避免手工制作增强参数。我们将方法应用于自动语音识别（ASR）任务，并使用所提出的框架结合现有和新颖的增强。我们在最先进的语音增强方法上显示出大量改进，在Librispeech数据集上的单词错误率相对降低高达21％。

Data augmentation methods usually apply the same augmentation (or a mix of them) to all the training samples. For example, to perturb data with noise, the noise is sampled from a Normal distribution with a fixed standard deviation, for all samples. We hypothesize that a hard sample with high training loss already provides strong training signal to update the model parameters and should be perturbed with mild or no augmentation. Perturbing a hard sample with a strong augmentation may also make it too hard to learn from. Furthermore, a sample with low training loss should be perturbed by a stronger augmentation to provide more robustness to a variety of conditions. To formalize these intuitions, we propose a novel method to learn a Sample-Adaptive Policy for Augmentation -- SapAugment. Our policy adapts the augmentation parameters based on the training loss of the data samples. In the example of Gaussian noise, a hard sample will be perturbed with a low variance noise and an easy sample with a high variance noise. Furthermore, the proposed method combines multiple augmentation methods into a methodical policy learning framework and obviates hand-crafting augmentation parameters by trial-and-error. We apply our method on an automatic speech recognition (ASR) task, and combine existing and novel augmentations using the proposed framework. We show substantial improvement, up to 21% relative reduction in word error rate on LibriSpeech dataset, over the state-of-the-art speech augmentation method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题