G-EAGMENT：搜索ASR数据增强策略的元结构

论文标题

G-EAGMENT：搜索ASR数据增强策略的元结构

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

论文作者

Wang, Gary, Cubuk, Ekin D., Rosenberg, Andrew, Cheng, Shuyang, Weiss, Ron J., Ramabhadran, Bhuvana, Moreno, Pedro J., Le, Quoc V., Park, Daniel S.

论文摘要

数据增强是一种无处不在的技术，用于为自动语音识别（ASR）培训提供鲁棒性。但是，即使大部分ASR培训过程已经变得自动化，并且更加“端到端”，数据增强策略（用于使用哪些增强功能以及如何应用功能）仍然是手工制作的。我们介绍图形仪表，这是一种按照指示的无环图（DAG）来定义增强空间的技术，并在此空间上进行搜索以优化增强策略本身。我们表明，鉴于相同的计算预算，G-EAGMEMM制定的策略能够比通过在Chime-6和AMI上随机搜索微调任务获得的规格策略更好地执行策略。 G-EAGMENT还能够在Chime-6评估集（30.7％WER）上建立新的最先进的ASR性能。我们进一步证明，与随机搜索的规格策略相比，G-EAGMAIG策略在温暖启动到寒冷训练和模型大小之间显示出更好的传递性能。

Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training. However, even as so much of the ASR training process has become automated and more "end-to-end", the data augmentation policy (what augmentation functions to use, and how to apply them) remains hand-crafted. We present Graph-Augment, a technique to define the augmentation space as directed acyclic graphs (DAGs) and search over this space to optimize the augmentation policy itself. We show that given the same computational budget, policies produced by G-Augment are able to perform better than SpecAugment policies obtained by random search on fine-tuning tasks on CHiME-6 and AMI. G-Augment is also able to establish a new state-of-the-art ASR performance on the CHiME-6 evaluation set (30.7% WER). We further demonstrate that G-Augment policies show better transfer properties across warm-start to cold-start training and model size compared to random-searched SpecAugment policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题