论文标题
混合先验的统计显着性测试:贝叶斯和频繁分析的组合
Statistical significance testing for mixed priors: a combined Bayesian and frequentist analysis
论文作者
论文摘要
在许多假设测试应用中,我们具有混合的先验,对于某些参数,有良好动机的信息先验,但对于其他参数而不是。贝叶斯方法论使用贝叶斯因子,对信息知识的先验有帮助,因为它通过多样性或试验因素在其他地方的效果中结合了Occam的剃须刀。但是,如果不完全知道先验,则通过假阳性速率进行频繁的假设检验是一种更好的方法,因为它对先前的选择不太敏感。我们认为,只有可用的部分先验信息,最好通过在频繁分析中使用贝叶斯因子作为测试统计来组合两种方法。我们表明,标准频繁的可能比率测试统计统计量对应于贝叶斯因子,具有非信息性杰弗里的先验。我们还表明,混合先验在频繁分析中增加了统计能力,而不是似然比测试统计量。我们开发一种分析形式主义,不需要使用统计力学方法来进行贝叶斯和频繁统计的假设检验昂贵的模拟。我们使用不确定性体积作为状态的量子在连续参数空间中介绍了状态的计数。我们表明,P值和贝叶斯因子都可以表示为能量与熵竞争。我们介绍了分析表达式,使威尔克斯定理的有效性和在非肿瘤方面的工作状态不仅仅是其通常的工作状态。在特定的范围内,形式主义再现现有的表达式,例如线性模型和周期图的p值。我们将形式主义应用于系外行星的示例,其中多重性可以超过$ 10^7 $。我们表明,我们的分析表达重现了来自数值模拟的p值。
In many hypothesis testing applications, we have mixed priors, with well-motivated informative priors for some parameters but not for others. The Bayesian methodology uses the Bayes factor and is helpful for the informative priors, as it incorporates Occam's razor via multiplicity or trials factor in the Look Elsewhere Effect. However, if the prior is not known completely, the frequentist hypothesis test via the false positive rate is a better approach, as it is less sensitive to the prior choice. We argue that when only partial prior information is available, it is best to combine the two methodologies by using the Bayes factor as a test statistic in the frequentist analysis. We show that the standard frequentist likelihood-ratio test statistic corresponds to the Bayes factor with a non-informative Jeffrey's prior. We also show that mixed priors increase the statistical power in frequentist analyses over the likelihood ratio test statistic. We develop an analytic formalism that does not require expensive simulations using a statistical mechanics approach to hypothesis testing in Bayesian and frequentist statistics. We introduce the counting of states in a continuous parameter space using the uncertainty volume as the quantum of the state. We show that both the p-value and Bayes factor can be expressed as energy versus entropy competition. We present analytic expressions that generalize Wilks' theorem beyond its usual regime of validity and work in a non-asymptotic regime. In specific limits, the formalism reproduces existing expressions, such as the p-value of linear models and periodograms. We apply the formalism to an example of exoplanet transits, where multiplicity can be more than $10^7$. We show that our analytic expressions reproduce the p-values derived from numerical simulations.