论文标题
减轻神经机器翻译的注意力头的不平等
Alleviating the Inequality of Attention Heads for Neural Machine Translation
论文作者
论文摘要
最近的研究表明,变压器中的注意力不相等。我们将这种现象与多头注意力的不平衡训练以及对特定头部的模型依赖性联系在一起。为了解决这个问题,我们提出了一种简单的掩蔽方法:戴上两种特定方式。实验表明,翻译改进是在多种语言对上实现的。随后的经验分析也支持我们的假设并确认该方法的有效性。
Recent studies show that the attention heads in Transformer are not equal. We relate this phenomenon to the imbalance training of multi-head attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.