论文标题
对抗语言理解的对抗性自我注意
Adversarial Self-Attention for Language Understanding
论文作者
论文摘要
深层神经模型(例如变压器)自然地学习虚假特征,从而在标签和输入之间创建``快捷方式'',从而损害了概括和鲁棒性。本文将自我注意的机制推向了其强大的变体,用于基于变压器的预训练的语言模型(例如BERT)。我们提出\ textIt {对抗自我注意力}机制(ASA),该机制在对抗性上偏向于关注,以有效地抑制模型对特征(例如特定关键字)的依赖并鼓励其对更广泛语义的探索。我们对预训练和微调阶段进行了广泛的任务进行全面评估。对于预训练,与较长的步骤相比,ASA与幼稚的训练相比,展现了显着的性能。为了进行微调,考虑到概括和稳健性,ASA授权的模型大于天真的模型。
Deep neural models (e.g. Transformer) naturally learn spurious features, which create a ``shortcut'' between the labels and inputs, thus impairing the generalization and robustness. This paper advances the self-attention mechanism to its robust variant for Transformer-based pre-trained language models (e.g. BERT). We propose \textit{Adversarial Self-Attention} mechanism (ASA), which adversarially biases the attentions to effectively suppress the model reliance on features (e.g. specific keywords) and encourage its exploration of broader semantics. We conduct a comprehensive evaluation across a wide range of tasks for both pre-training and fine-tuning stages. For pre-training, ASA unfolds remarkable performance gains compared to naive training for longer steps. For fine-tuning, ASA-empowered models outweigh naive models by a large margin considering both generalization and robustness.