论文标题
在线多个假设测试
Online multiple hypothesis testing
论文作者
论文摘要
现代数据分析经常涉及大规模的假设检验,这自然会导致控制合适的I型错误率(例如错误发现率(FDR))的问题。在许多生物医学和技术应用中,另一个复杂性是,随着时间的流逝,假设以在线方式进行测试。但是,控制FDR的传统程序,例如Benjamini-Hochberg程序,假定所有P值都可以在一个时间点进行测试。为了应对这些挑战,在过去的15年中,一个新的方法论已经发展出了如何控制在线多个假设测试的错误率。在此框架中,假设到达流中,分析师在每个时间点都决定是否拒绝基于反对它的证据的当前假设,又是基于先前的拒绝决定。在本文中,我们介绍了有关在线错误率控制的文献的全面论述,并综述了关键理论以及对应用示例的重点。我们还提供了模拟结果,以比较不同的在线测试算法以及对已提出的许多方法论扩展的最新概述。
Modern data analysis frequently involves large-scale hypothesis testing, which naturally gives rise to the problem of maintaining control of a suitable type I error rate, such as the false discovery rate (FDR). In many biomedical and technological applications, an additional complexity is that hypotheses are tested in an online manner, one-by-one over time. However, traditional procedures that control the FDR, such as the Benjamini-Hochberg procedure, assume that all p-values are available to be tested at a single time point. To address these challenges, a new field of methodology has developed over the past 15 years showing how to control error rates for online multiple hypothesis testing. In this framework, hypotheses arrive in a stream, and at each time point the analyst decides whether to reject the current hypothesis based both on the evidence against it, and on the previous rejection decisions. In this paper, we present a comprehensive exposition of the literature on online error rate control, with a review of key theory as well as a focus on applied examples. We also provide simulation results comparing different online testing algorithms and an up-to-date overview of the many methodological extensions that have been proposed.