关于人类团队中预测集的实用性

论文标题

关于人类团队中预测集的实用性

On the Utility of Prediction Sets in Human-AI Teams

论文作者

Babbar, Varun, Bhatt, Umang, Weller, Adrian

论文摘要

关于人类团队的研究通常为专家提供一个单个标签，这忽略了模型建议中的不确定性。共形预测（CP）是一项良好的研究线，着重于构建理论上扎根的校准预测集，其中可能包含多个标签。我们探讨了这种预测如何影响人类团队中的专家决策。我们对人类受试者的评估发现，将有价值的预测设定为对专家的积极影响。但是，我们注意到CP提供的预测集可能非常大，这导致了无助的AI助手。为了减轻这种情况，我们介绍了D-CP，这是一种在某些示例中执行CP并延迟专家的方法。我们证明，D-CP可以减少非脱佛像示例的预测集大小。我们展示了D-CP在定量和人类主题实验中的表现（$ n = 120 $）。我们的结果表明，CP预测集改善了人类AI团队的性能，而不是仅显示TOP-1预测，并且专家发现D-CP预测集比CP预测集更有用。

Research on human-AI teams usually provides experts with a single label, which ignores the uncertainty in a model's recommendation. Conformal prediction (CP) is a well established line of research that focuses on building a theoretically grounded, calibrated prediction set, which may contain multiple labels. We explore how such prediction sets impact expert decision-making in human-AI teams. Our evaluation on human subjects finds that set valued predictions positively impact experts. However, we notice that the predictive sets provided by CP can be very large, which leads to unhelpful AI assistants. To mitigate this, we introduce D-CP, a method to perform CP on some examples and defer to experts. We prove that D-CP can reduce the prediction set size of non-deferred examples. We show how D-CP performs in quantitative and in human subject experiments ($n=120$). Our results suggest that CP prediction sets improve human-AI team performance over showing the top-1 prediction alone, and that experts find D-CP prediction sets are more useful than CP prediction sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题