论文标题

增强单调的多头关注以流媒体ASR

Enhancing Monotonic Multihead Attention for Streaming ASR

论文作者

Inaguma, Hirofumi, Mimura, Masato, Kawahara, Tatsuya

论文摘要

我们通过将硬单调的注意力扩展到基于变压器的自动语音识别(ASR)来研究单调的多头注意(MMA)。对于流推断,所有单调关注(MA)的头应学习适当的对齐,因为直到所有头部都检测到相应的令牌边界之前,不会生成接下来的令牌。但是,我们发现并非所有的MA负责人都学会以幼稚的实现为准。为了鼓励每个人都正确学习对齐,我们建议通过在训练过程中按随机掩盖一部分头部的头顶正规化。此外,我们建议修剪冗余头部,以提高头部之间的共识,以供边界检测,并防止这种头部引起的令牌产生延迟。每个MA头上的关注都扩展到多头对方。最后,我们提出了头部同步梁搜索解码,以确保稳定的流推断。

We investigate a monotonic multihead attention (MMA) by extending hard monotonic attention to Transformer-based automatic speech recognition (ASR) for online streaming applications. For streaming inference, all monotonic attention (MA) heads should learn proper alignments because the next token is not generated until all heads detect the corresponding token boundaries. However, we found not all MA heads learn alignments with a naïve implementation. To encourage every head to learn alignments properly, we propose HeadDrop regularization by masking out a part of heads stochastically during training. Furthermore, we propose to prune redundant heads to improve consensus among heads for boundary detection and prevent delayed token generation caused by such heads. Chunkwise attention on each MA head is extended to the multihead counterpart. Finally, we propose head-synchronous beam search decoding to guarantee stable streaming inference.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源