AI的以人为中心的评估框架

论文标题

AI的以人为中心的评估框架

A Human-Centric Assessment Framework for AI

论文作者

Saralajew, Sascha, Shaker, Ammar, Xu, Zhao, Gashteovski, Kiril, Kotnis, Bhushan, Rim, Wiem Ben, Quittek, Jürgen, Lawrence, Carolin

论文摘要

随着现实应用程序中AI系统的兴起，需要可靠和值得信赖的AI。一个基本方面是可解释的AI系统。但是，关于应如何评估可解释的AI系统的商定标准。受图灵测试的启发，我们引入了一个以人为本的评估框架，领先的领域专家接受或拒绝AI系统和另一个领域专家的解决方案。通过比较提供的解决方案的接受率，我们可以评估AI系统与域专家相比的性能，以及AI系统的解释（如果提供）是否可以理解。该设置与图灵测试相当 - 可以作为各种以人为中心的AI系统评估的框架。我们通过提出两个实例来证明这一点：（1）评估系统的分类精度，可以选择将标签不确定性纳入标签；（2）评估以人为中心确定提供的解释的有用性。

With the rise of AI systems in real-world applications comes the need for reliable and trustworthy AI. An essential aspect of this are explainable AI systems. However, there is no agreed standard on how explainable AI systems should be assessed. Inspired by the Turing test, we introduce a human-centric assessment framework where a leading domain expert accepts or rejects the solutions of an AI system and another domain expert. By comparing the acceptance rates of provided solutions, we can assess how the AI system performs compared to the domain expert, and whether the AI system's explanations (if provided) are human-understandable. This setup -- comparable to the Turing test -- can serve as a framework for a wide range of human-centric AI system assessments. We demonstrate this by presenting two instantiations: (1) an assessment that measures the classification accuracy of a system with the option to incorporate label uncertainties; (2) an assessment where the usefulness of provided explanations is determined in a human-centric manner.

下载PDF全文

下载文献需遵守相关版权规定

论文标题