Infobert：从信息理论的角度提高语言模型的鲁棒性

论文标题

Infobert：从信息理论的角度提高语言模型的鲁棒性

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

论文作者

Wang, Boxin, Wang, Shuohang, Cheng, Yu, Gan, Zhe, Jia, Ruoxi, Li, Bo, Liu, Jingjing

论文摘要

诸如BERT之类的大规模语言模型已经在各种NLP任务中实现了最先进的性能。然而，最近的研究表明，这种基于BERT的模型面临文本对抗攻击的威胁是脆弱的。我们的目标是从信息理论的角度解决这个问题，并提出Infobert，这是一个新颖的学习框架，可对预训练的语言模型进行良好的微调。 Infobert包含两个基于相互信息的正规化器用于模型培训：（i）信息瓶颈正常化程序，该信息抑制了输入和特征表示之间的嘈杂互助信息；（ii）一个可靠的功能正常器，它增加了局部鲁棒特征和全局特征之间的相互信息。我们提供了一种原则性的方式，可以理论上分析和改善标准和对抗性培训中语言模型的表示能力的鲁棒性。广泛的实验表明，Infobert在几个关于自然语言推断（NLI）和问答（QA）任务的对抗数据集（QA）任务上实现了最新的良好精度。我们的代码可在https://github.com/ai-secure/infobert上找到。

Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies, however, show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We aim to address this problem from an information-theoretic perspective, and propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models. InfoBERT contains two mutual-information-based regularizers for model training: (i) an Information Bottleneck regularizer, which suppresses noisy mutual information between the input and the feature representation; and (ii) a Robust Feature regularizer, which increases the mutual information between local robust features and global features. We provide a principled way to theoretically analyze and improve the robustness of representation learning for language models in both standard and adversarial training. Extensive experiments demonstrate that InfoBERT achieves state-of-the-art robust accuracy over several adversarial datasets on Natural Language Inference (NLI) and Question Answering (QA) tasks. Our code is available at https://github.com/AI-secure/InfoBERT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题