论文标题
说话头引起关注
Talking-Heads Attention
论文作者
论文摘要
我们介绍了“谈话头的注意” - 多头注意的一种变化,包括在软磁性操作之前和之后,在注意力头维度上进行了线性重视。虽然仅插入少数其他参数,而在跨越的语言中,请注意更高的语言和质量质量的语言,以更好地回答语言,并在更高的语言上进行互补,并以较高的方式进行访问。
We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation.While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.