ALIGN-REFINE：通过迭代重新调整的非自动回归语音识别

论文标题

ALIGN-REFINE：通过迭代重新调整的非自动回归语音识别

Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment

论文作者

Chi, Ethan A., Salazar, Julian, Kirchhoff, Katrin

论文摘要

非自动入学模型大大提高了典型序列到序列模型的解码速度，但性能降低。填充和迭代的改进模型通过编辑非自动收入模型的输出来构成了一些差距，但在它们可以进行的编辑中受到限制。我们提出了迭代重组，其中进行了细化，而不是在潜在的对准而不是输出序列空间。我们用Align-Refine（一种基于端到端变压器的模型Align-Refine）在语音识别中证明了这一点，该模型完善了连接派时间分类（CTC）对齐，以允许长度变化的插入和删除。 Align-Refine的表现优于螺丝符和面具-CTC，以实时因子为1/14上的WSJ上的自回归基线，并在没有LM的情况下获得9.0％的LibrisPeech测试。即使在一个较浅的解码器中，我们的模型也很强。

Non-autoregressive models greatly improve decoding speed over typical sequence-to-sequence models, but suffer from degraded performance. Infilling and iterative refinement models make up some of this gap by editing the outputs of a non-autoregressive model, but are constrained in the edits that they can make. We propose iterative realignment, where refinements occur over latent alignments rather than output sequence space. We demonstrate this in speech recognition with Align-Refine, an end-to-end Transformer-based model which refines connectionist temporal classification (CTC) alignments to allow length-changing insertions and deletions. Align-Refine outperforms Imputer and Mask-CTC, matching an autoregressive baseline on WSJ at 1/14th the real-time factor and attaining a LibriSpeech test-other WER of 9.0% without an LM. Our model is strong even in one iteration with a shallower decoder.

下载PDF全文

下载文献需遵守相关版权规定

论文标题