修剪的RNN-T用于快速，记忆效率的ASR训练

论文标题

修剪的RNN-T用于快速，记忆效率的ASR训练

Pruned RNN-T for fast, memory-efficient ASR training

论文作者

Kuang, Fangjun, Guo, Liyong, Kang, Wei, Lin, Long, Luo, Mingshuang, Yao, Zengwei, Povey, Daniel

论文摘要

语音识别的RNN-TransDucer（RNN-T）框架一直在越来越受欢迎，尤其是用于实时部署的ASR系统，因为它将高精度与自然流媒体识别相结合。 RNN-T的缺点之一是其损耗函数相对较慢，并且可以使用大量内存。在词汇大小较大的情况下，使用RNN-T损失的过多GPU内存使用可能会使使用RNN-T损失是不切实际的：例如，对于基于中文的ASR而言。我们介绍了一种方法，用于更快，更快的记忆效率RNN-T损失计算。我们首先使用在编码器和解码器嵌入式中线性的简单木质网络获得RNN-T递归的修剪边界；我们可以在不使用很多内存的情况下对此进行评估。然后，我们使用那些修剪界限来评估完整的非线性木匠网络。

The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition. One of the drawbacks of RNN-T is that its loss function is relatively slow to compute, and can use a lot of memory. Excessive GPU memory usage can make it impractical to use RNN-T loss in cases where the vocabulary size is large: for example, for Chinese character-based ASR. We introduce a method for faster and more memory-efficient RNN-T loss computation. We first obtain pruning bounds for the RNN-T recursion using a simple joiner network that is linear in the encoder and decoder embeddings; we can evaluate this without using much memory. We then use those pruning bounds to evaluate the full, non-linear joiner network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题