论文标题
您的模型在预测过去吗?
Is your model predicting the past?
论文作者
论文摘要
机器学习模型何时预测个人的未来,什么时候背诵早于个人的模式?在这项工作中,我们提出了这两种预测途径的区别,这些预测途径得到了理论,经验和规范性论点的支持。我们提案的中心是一个简单有效的统计测试家族,称为向后基线,这些家族表明了模型是否在何种程度上讲述了过去。我们的统计理论提供了解释向后基线的指导,建立了不同基准和熟悉的统计概念之间的等价。具体而言,我们从审核预测系统作为黑匣子的情况下得出有意义的向后基线,只有背景变量和系统的预测。从经验上讲,我们在纵向面板调查中得出的不同预测任务上评估了框架,这表明将向后基线纳入机器学习实践的便捷性和有效性。
When does a machine learning model predict the future of individuals and when does it recite patterns that predate the individuals? In this work, we propose a distinction between these two pathways of prediction, supported by theoretical, empirical, and normative arguments. At the center of our proposal is a family of simple and efficient statistical tests, called backward baselines, that demonstrate if, and to what extent, a model recounts the past. Our statistical theory provides guidance for interpreting backward baselines, establishing equivalences between different baselines and familiar statistical concepts. Concretely, we derive a meaningful backward baseline for auditing a prediction system as a black box, given only background variables and the system's predictions. Empirically, we evaluate the framework on different prediction tasks derived from longitudinal panel surveys, demonstrating the ease and effectiveness of incorporating backward baselines into the practice of machine learning.