使用指导gan的不匹配环境中有效的声学特征转化

论文标题

使用指导gan的不匹配环境中有效的声学特征转化

Efficient acoustic feature transformation in mismatched environments using a Guided-GAN

论文作者

Heymans, Walter, Davel, Marelie H., van Heerden, Charl

论文摘要

我们提出了一个新的框架，以使用在声学输入功能上运行的生成对抗网络（GAN）来改善资源范围环境中的自动语音识别（ASR）系统。 GAN用于在解码之前增强不匹配数据的功能，或者可以选择地用于微调声学模型。我们实现了与多风格培训（MTR）相当的改进，但计算成本较低。在不到一小时的数据的情况下，接受了高质量数据培训的ASR系统，并在不匹配的音频上进行了评估，相对单词错误率（WER）提高了11.5％至19.7％。实验表明，该框架在培训数据和计算资源受到限制的资源不足的环境中非常有用。 GAN不需要并行训练数据，因为它利用基线声学模型提供了一个额外的损失术语，该损失术语指导发电机创建由基线更好地分类的声学特征。

We propose a new framework to improve automatic speech recognition (ASR) systems in resource-scarce environments using a generative adversarial network (GAN) operating on acoustic input features. The GAN is used to enhance the features of mismatched data prior to decoding, or can optionally be used to fine-tune the acoustic model. We achieve improvements that are comparable to multi-style training (MTR), but at a lower computational cost. With less than one hour of data, an ASR system trained on good quality data, and evaluated on mismatched audio is improved by between 11.5% and 19.7% relative word error rate (WER). Experiments demonstrate that the framework can be very useful in under-resourced environments where training data and computational resources are limited. The GAN does not require parallel training data, because it utilises a baseline acoustic model to provide an additional loss term that guides the generator to create acoustic features that are better classified by the baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题