XNOR形式：在长语音变压器中学习准确的近似值

论文标题

XNOR形式：在长语音变压器中学习准确的近似值

XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers

论文作者

Sharma, Roshan, Raj, Bhiksha

论文摘要

在语音，视觉和自然语言处理等许多任务等方面，变形金刚是艺术品之一。自我攻击是这种性能的关键因素，具有二次计算复杂性，这使得在更长的输入序列上训练具有挑战性。先前的工作已经产生了最新的变压器变体，并引起了线性的关注，但是，当前的模型牺牲了绩效以实现有效的实现。在这项工作中，我们通过在自我生产中检查键性产品的特性来开发一种新型的线性变压器。我们的模型在语音识别和语音摘要方面的表现优于艺术的方法，从而在LibrisPeech-100语音识别基准和新的访谈语音识别基准和Rouge上获得了5分，从而在librispeech-100的语音识别基准上获得了1％的绝对改善，并获得了5分，以汇总2。

Transformers are among the state of the art for many tasks in speech, vision, and natural language processing, among others. Self-attentions, which are crucial contributors to this performance have quadratic computational complexity, which makes training on longer input sequences challenging. Prior work has produced state-of-the-art transformer variants with linear attention, however, current models sacrifice performance to achieve efficient implementations. In this work, we develop a novel linear transformer by examining the properties of the key-query product within self-attentions. Our model outperforms state of the art approaches on speech recognition and speech summarization, resulting in 1 % absolute WER improvement on the Librispeech-100 speech recognition benchmark and a new INTERVIEW speech recognition benchmark, and 5 points on ROUGE for summarization with How2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题