论文标题
部分可观测时空混沌系统的无模型预测
Selection by Prediction with Conformal p-values
论文作者
论文摘要
决策或科学发现管道(例如工作招聘和药物发现)通常涉及多个阶段:在任何资源密集的步骤之前,通常会有初始筛选使用机器学习模型的预测来将一些候选人从大型池中候选。我们研究旨在选择未观察结果超过用户指定值的候选人的筛选程序。我们开发了一种围绕任何预测模型的方法,以在控制错误选择单位的比例时产生一部分候选者。我们的方法在共同推理框架的基础上,首先构建p值,以量化大量结果的统计证据;然后,它通过将p值与多个测试文献中引入的阈值进行比较来确定候选名单。在许多情况下,该过程选择预测高于数据依赖性阈值的候选人。我们的理论保证在样品中的轻度交换性条件下保证,将现有结果推广到多个共形P值。我们通过模拟证明了我们方法的经验性能,并将其应用于工作招聘和药物发现数据集。
Decision making or scientific discovery pipelines such as job hiring and drug discovery often involve multiple stages: before any resource-intensive step, there is often an initial screening that uses predictions from a machine learning model to shortlist a few candidates from a large pool. We study screening procedures that aim to select candidates whose unobserved outcomes exceed user-specified values. We develop a method that wraps around any prediction model to produce a subset of candidates while controlling the proportion of falsely selected units. Building upon the conformal inference framework, our method first constructs p-values that quantify the statistical evidence for large outcomes; it then determines the shortlist by comparing the p-values to a threshold introduced in the multiple testing literature. In many cases, the procedure selects candidates whose predictions are above a data-dependent threshold. Our theoretical guarantee holds under mild exchangeability conditions on the samples, generalizing existing results on multiple conformal p-values. We demonstrate the empirical performance of our method via simulations, and apply it to job hiring and drug discovery datasets.