关于深神经网络模型的解释的鲁棒性：一项调查

论文标题

关于深神经网络模型的解释的鲁棒性：一项调查

On the Robustness of Explanations of Deep Neural Network Models: A Survey

论文作者

Jyoti, Amlan, Ganesh, Karthik Balaji, Gayala, Manoj, Tunuguntla, Nandita Lakshmi, Kamath, Sandesh, Balasubramanian, Vineeth N

论文摘要

解释性已被广泛说明是对机器学习模型负责和值得信赖的使用的基石。通过无处不在的深神经网络（DNN）模型扩展到风险敏感和关键性领域，已经提出了许多方法来解释这些模型的决策。近年来还看到了一致的努力，这些努力表明了如何通过少量输入扰动扭曲（攻击）这种解释。尽管有许多调查本身会审查解释性方法，但迄今为止，迄今为止没有努力吸收提出的不同方法和指标，以研究DNN模型解释的鲁棒性。在这项工作中，我们对研究，理解，攻击和捍卫DNN模型的解释的方法进行了全面调查。我们还介绍了用于评估解释方法的不同指标的详细综述，并描述了归因攻击和防御方法。我们在社区的课程和接受课程中结束，以确保对DNN模型预测的强有力解释。

Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can be distorted (attacked) by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work, we present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods. We conclude with lessons and take-aways for the community towards ensuring robust explanations of DNN model predictions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题