论文标题
不要解释噪音:随机合奏的鲁棒反事实
Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles
论文作者
论文摘要
反事实解释描述了如何修改特征向量以翻转训练有素的分类器的结果。获得强大的反事实解释对于提供有效的算法追索和有意义的解释至关重要。我们研究了随机合奏的解释的鲁棒性,即使培训数据固定,它们始终会受到算法不确定性的影响。我们将强大的反事实解释的产生形式化为概率问题,并显示合奏模型的鲁棒性与基础学习者的稳健性之间的联系。我们开发了一种实用方法,具有良好的经验绩效,并通过理论保证了凸基础学习者的合奏。我们的结果表明,现有方法具有出乎意料的鲁棒性:天真反事实的有效性低于$ 50 \%$ $,并且在许多功能的问题上可能会降至$ 20 \%$。相反,我们的方法实现了高鲁棒性,而从反事实解释到其最初观察的距离只有很小的增加。
Counterfactual explanations describe how to modify a feature vector in order to flip the outcome of a trained classifier. Obtaining robust counterfactual explanations is essential to provide valid algorithmic recourse and meaningful explanations. We study the robustness of explanations of randomized ensembles, which are always subject to algorithmic uncertainty even when the training data is fixed. We formalize the generation of robust counterfactual explanations as a probabilistic problem and show the link between the robustness of ensemble models and the robustness of base learners. We develop a practical method with good empirical performance and support it with theoretical guarantees for ensembles of convex base learners. Our results show that existing methods give surprisingly low robustness: the validity of naive counterfactuals is below $50\%$ on most data sets and can fall to $20\%$ on problems with many features. In contrast, our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.