论文标题
信任 - 失误:用于模型监控的可解释且可操作的不信任评分框架
TRUST-LAPSE: An Explainable and Actionable Mistrust Scoring Framework for Model Monitoring
论文作者
论文摘要
对训练有素的ML模型进行持续监控,以确定他们的预测何时应该和不应信任的预测对于他们的安全部署至关重要。这样的框架应该是高性能,可解释的,事后和可行的。我们提出了信任范围,这是一个用于连续模型监视的“不信任”评分框架。我们使用一系列潜在空间嵌入来评估每个输入样本模型预测的可信度。具体而言,(a)使用距离指标(Mahalanobis距离)和潜在空间中的相似度指标(余弦相似性),我们的潜在空间不信任得分估计不信任,并且(b)我们的顺序不信任得分决定了相关的偏差,而不是在非差异的基于差异的algorith ables ables ables ablestoce执行中的相关性偏差。我们通过两个下游任务评估信任 - (1)分布移动的输入检测,(2)数据漂移检测。我们使用公共数据集评估了跨不同领域 - 音频和远见,并进一步基于我们在具有挑战性的,现实世界中的脑电图(EEG)数据集以进行癫痫发作检测的方法。我们的潜在空间不信任得分以84.1(视觉),73.9(音频)和77.1(临床EEG)的AUROCs获得最先进的结果,表现优于10分以上。我们暴露了对输入语义内容不敏感的流行基线中的关键故障,使它们不适合现实世界模型监视。我们表明我们的顺序不信任得分达到了高漂移检测率。超过90%的流对所有域的误差<20%。通过广泛的定性和定量评估,我们表明我们的不信任得分更强大,并为轻松采用实践提供了解释性。
Continuous monitoring of trained ML models to determine when their predictions should and should not be trusted is essential for their safe deployment. Such a framework ought to be high-performing, explainable, post-hoc and actionable. We propose TRUST-LAPSE, a "mistrust" scoring framework for continuous model monitoring. We assess the trustworthiness of each input sample's model prediction using a sequence of latent-space embeddings. Specifically, (a) our latent-space mistrust score estimates mistrust using distance metrics (Mahalanobis distance) and similarity metrics (cosine similarity) in the latent-space and (b) our sequential mistrust score determines deviations in correlations over the sequence of past input representations in a non-parametric, sliding-window based algorithm for actionable continuous monitoring. We evaluate TRUST-LAPSE via two downstream tasks: (1) distributionally shifted input detection, and (2) data drift detection. We evaluate across diverse domains - audio and vision using public datasets and further benchmark our approach on challenging, real-world electroencephalograms (EEG) datasets for seizure detection. Our latent-space mistrust scores achieve state-of-the-art results with AUROCs of 84.1 (vision), 73.9 (audio), and 77.1 (clinical EEGs), outperforming baselines by over 10 points. We expose critical failures in popular baselines that remain insensitive to input semantic content, rendering them unfit for real-world model monitoring. We show that our sequential mistrust scores achieve high drift detection rates; over 90% of the streams show < 20% error for all domains. Through extensive qualitative and quantitative evaluations, we show that our mistrust scores are more robust and provide explainability for easy adoption into practice.