论文标题
快速训练深度神经网络对对抗性扰动的强大训练
Fast Training of Deep Neural Networks Robust to Adversarial Perturbations
论文作者
论文摘要
深层神经网络能够在许多领域中快速训练和概括。尽管表现出色,但深层网络对其投入的扰动表现出敏感性(例如,对抗性示例),并且他们学习的功能表示通常很难解释,从而引起了对他们真正的能力和可信度的关注。对抗性训练的最新工作是一种强大的优化形式,其中模型针对对抗性示例进行了优化,它表明了提高性能敏感性对扰动和产量特征表征的能力,这些功能更为可解释。但是,对抗性培训的计算成本比标准培训的计算成本更高,这使得它不切实际地用于大规模问题。最近的工作表明,对对抗性训练的快速近似表明有望减少训练时间并在存在着无限规范界定的扰动下保持稳健性。在这项工作中,我们证明了这种方法扩展到欧几里得规范,并保留了适合强大模型的人类一致特征表示。此外,我们表明,使用分布式培训方案可以进一步减少培训强大深层网络的时间。快速对手训练是一种有前途的方法,它将在机器学习应用程序中提高安全性和解释性,以前认为强大的优化是不切实际的。
Deep neural networks are capable of training fast and generalizing well within many domains. Despite their promising performance, deep networks have shown sensitivities to perturbations of their inputs (e.g., adversarial examples) and their learned feature representations are often difficult to interpret, raising concerns about their true capability and trustworthiness. Recent work in adversarial training, a form of robust optimization in which the model is optimized against adversarial examples, demonstrates the ability to improve performance sensitivities to perturbations and yield feature representations that are more interpretable. Adversarial training, however, comes with an increased computational cost over that of standard (i.e., nonrobust) training, rendering it impractical for use in large-scale problems. Recent work suggests that a fast approximation to adversarial training shows promise for reducing training time and maintaining robustness in the presence of perturbations bounded by the infinity norm. In this work, we demonstrate that this approach extends to the Euclidean norm and preserves the human-aligned feature representations that are common for robust models. Additionally, we show that using a distributed training scheme can further reduce the time to train robust deep networks. Fast adversarial training is a promising approach that will provide increased security and explainability in machine learning applications for which robust optimization was previously thought to be impractical.