论文标题
评估神经机器理解模型的鲁棒性对嘈杂的输入和对抗性攻击
Evaluating Neural Machine Comprehension Model Robustness to Noisy Inputs and Adversarial Attacks
论文作者
论文摘要
我们通过在字符,单词和句子级别上执行新颖的扰动来评估机器理解模型对噪声和对抗性攻击的鲁棒性。我们试验了不同量的扰动,以检查模型的置信度和错误分类率,并在两个基准数据集中使用不同的嵌入类型进行对抗训练中的对比模型性能。我们通过结合表明了改进的模型性能。最后,我们分析在对抗训练下影响模型行为的因素,并开发一个模型,以预测对抗攻击期间模型错误。
We evaluate machine comprehension models' robustness to noise and adversarial attacks by performing novel perturbations at the character, word, and sentence level. We experiment with different amounts of perturbations to examine model confidence and misclassification rate, and contrast model performance in adversarial training with different embedding types on two benchmark datasets. We demonstrate improving model performance with ensembling. Finally, we analyze factors that effect model behavior under adversarial training and develop a model to predict model errors during adversarial attacks.