论文标题

说话头引起关注

Talking-Heads Attention

论文作者

Shazeer, Noam, Lan, Zhenzhong, Cheng, Youlong, Ding, Nan, Hou, Le

论文摘要

我们介绍了“谈话头的注意” - 多头注意的一种变化,包括在软磁性操作之前和之后,在注意力头维度上进行了线性重视。虽然仅插入少数其他参数,而在跨越的语言中,请注意更高的语言和质量质量的语言,以更好地回答语言,并在更高的语言上进行互补,并以较高的方式进行访问。

We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation.While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源