论文标题
关于本地解释性的学习理论观点
A Learning Theoretic Perspective on Local Explainability
论文作者
论文摘要
在本文中,我们通过局部近似解释的角度探讨了可解释的机器学习与学习理论之间的联系。首先,我们使用局部解释的概念解决了传统的性能概括问题,并限制了模型的测试时间准确性。其次,我们探讨了解释概括的新问题,这对于不断增长的基于样本的局部近似解释是一个重要的问题。最后,我们从经验上验证了理论结果,并表明它们反映了实践中可以看到的结果。
In this paper, we explore connections between interpretable machine learning and learning theory through the lens of local approximation explanations. First, we tackle the traditional problem of performance generalization and bound the test-time accuracy of a model using a notion of how locally explainable it is. Second, we explore the novel problem of explanation generalization which is an important concern for a growing class of finite sample-based local approximation explanations. Finally, we validate our theoretical results empirically and show that they reflect what can be seen in practice.