对AI的信任：不需要或足够的解释性，而黑盒交互是必要的

论文标题

对AI的信任：不需要或足够的解释性，而黑盒交互是必要的

Trust in AI: Interpretability is not necessary or sufficient, while black-box interaction is necessary and sufficient

论文作者

Shen, Max W.

论文摘要

人工智能信任的问题是应用机器学习中最根本的问题之一。我们评估AI可信度的过程对ML对科学，健康和人类的影响产生了重大影响，但混乱却围绕着基础概念。信任AI意味着什么，人类如何评估AI的可信度？建立值得信赖的AI的机制是什么？可解释的ML在信任中的作用是什么？在这里，我们借鉴了人类自动化信任的统计学习理论和社会学镜头，以激励AI-AS-Al-Al框架，该框架将人类的信任与人类 - 人类 - 人类的信任区分开来。评估AI的合同信任度涉及使用行为证书（BCS）预测未来的模型行为，该行为证书（BCS）汇总了来自各种来源的行为证据，包括经验范围内分布和任务外评估以及将模型架构与行为联系起来的理论证明。我们通过模型访问梯子阐明了可解释性在信任中的作用。解释性（第3级）是不需要的，甚至不足以实现信任，而在Will（2级）运行黑框模型的能力是必要且足够的。虽然可解释性可以为信任带来利益，但它也可以产生成本。我们阐明了解释性可以有助于信任的方法，同时质疑解释性的中心性以信任流行话语。我们如何使人们使用工具来评估信任？我们没有试图了解模型的工作原理，而是主张理解模型的行为。我们应该创建更多更正确，相关且易于理解的行为证书，而不是打开黑匣子。我们讨论如何负责任地建立可信赖和值得信赖的AI。

The problem of human trust in artificial intelligence is one of the most fundamental problems in applied machine learning. Our processes for evaluating AI trustworthiness have substantial ramifications for ML's impact on science, health, and humanity, yet confusion surrounds foundational concepts. What does it mean to trust an AI, and how do humans assess AI trustworthiness? What are the mechanisms for building trustworthy AI? And what is the role of interpretable ML in trust? Here, we draw from statistical learning theory and sociological lenses on human-automation trust to motivate an AI-as-tool framework, which distinguishes human-AI trust from human-AI-human trust. Evaluating an AI's contractual trustworthiness involves predicting future model behavior using behavior certificates (BCs) that aggregate behavioral evidence from diverse sources including empirical out-of-distribution and out-of-task evaluation and theoretical proofs linking model architecture to behavior. We clarify the role of interpretability in trust with a ladder of model access. Interpretability (level 3) is not necessary or even sufficient for trust, while the ability to run a black-box model at-will (level 2) is necessary and sufficient. While interpretability can offer benefits for trust, it can also incur costs. We clarify ways interpretability can contribute to trust, while questioning the perceived centrality of interpretability to trust in popular discourse. How can we empower people with tools to evaluate trust? Instead of trying to understand how a model works, we argue for understanding how a model behaves. Instead of opening up black boxes, we should create more behavior certificates that are more correct, relevant, and understandable. We discuss how to build trusted and trustworthy AI responsibly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题