事后解释无法在对抗环境中实现其目的

论文标题

事后解释无法在对抗环境中实现其目的

Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts

论文作者

Bordt, Sebastian, Finck, Michèle, Raidl, Eric, von Luxburg, Ulrike

论文摘要

现有和计划的立法规定了各种义务，以提供有关机器学习算法及其功能的信息，通常被解释为“解释”的义务。许多研究人员建议为此目的使用事后解释算法。在本文中，我们结合了法律，哲学和技术论证，以表明事后解释算法不适合实现法律的目标。的确，大多数要求解释的情况都是对抗性的，这意味着解释提供者和接收者具有相反的利益和激励措施，因此提供者可以操纵自己目的的解释。我们表明，由于现实的应用程序场景中事后解释的高度歧义，这种基本冲突无法解决。结果，事后解释算法不适合实现法律规范固有的透明目标。取而代之的是，需要更明确地讨论“解释性”义务的目标，因为通常可以通过其他机制更好地实现这些义务。迫切需要就对抗性背景下的事后解释的潜在和局限性进行更开放和诚实的讨论，特别是鉴于欧盟人工智能法案草案的当前谈判。

Existing and planned legislation stipulates various obligations to provide information about machine learning algorithms and their functioning, often interpreted as obligations to "explain". Many researchers suggest using post-hoc explanation algorithms for this purpose. In this paper, we combine legal, philosophical and technical arguments to show that post-hoc explanation algorithms are unsuitable to achieve the law's objectives. Indeed, most situations where explanations are requested are adversarial, meaning that the explanation provider and receiver have opposing interests and incentives, so that the provider might manipulate the explanation for her own ends. We show that this fundamental conflict cannot be resolved because of the high degree of ambiguity of post-hoc explanations in realistic application scenarios. As a consequence, post-hoc explanation algorithms are unsuitable to achieve the transparency objectives inherent to the legal norms. Instead, there is a need to more explicitly discuss the objectives underlying "explainability" obligations as these can often be better achieved through other mechanisms. There is an urgent need for a more open and honest discussion regarding the potential and limitations of post-hoc explanations in adversarial contexts, in particular in light of the current negotiations of the European Union's draft Artificial Intelligence Act.

下载PDF全文

下载文献需遵守相关版权规定

论文标题