P值的校准用于校准和偏离亚种群

论文标题

P值的校准用于校准和偏离亚种群

Calibration of P-values for calibration and for deviation of a subpopulation from the full population

论文作者

Tygert, Mark

论文摘要

作者最近的研究论文“与全人口的亚人群的累积偏差”和“两种亚群之间的累积差异的图形方法”（均在Springer开放访问的“ 2021年大数据杂志”的第8卷中发表），提出图形方法和摘要统计数据，而无需进行大量的正式校准。摘要指标和方法可以衡量概率预测的校准，并可以评估子群和全部人群之间的响应差异，同时通过其条件控制协变量或得分。这些最近发表的论文基于标量摘要统计数据构建了显着性测试，但仅素描如何校准这些测试的显着性水平（也称为“ p值”）。本文回顾并综合了数十年的工作，以详细介绍如何校准P值。本文介绍了计算高效，易于实现的数值方法，用于评估正确校准的p值，以及确保其准确性的严格数学证明，并用开放源软件和数值示例说明并验证了这些方法。

The author's recent research papers, "Cumulative deviation of a subpopulation from the full population" and "A graphical method of cumulative differences between two subpopulations" (both published in volume 8 of Springer's open-access "Journal of Big Data" during 2021), propose graphical methods and summary statistics, without extensively calibrating formal significance tests. The summary metrics and methods can measure the calibration of probabilistic predictions and can assess differences in responses between a subpopulation and the full population while controlling for a covariate or score via conditioning on it. These recently published papers construct significance tests based on the scalar summary statistics, but only sketch how to calibrate the attained significance levels (also known as "P-values") for the tests. The present article reviews and synthesizes work spanning many decades in order to detail how to calibrate the P-values. The present paper presents computationally efficient, easily implemented numerical methods for evaluating properly calibrated P-values, together with rigorous mathematical proofs guaranteeing their accuracy, and illustrates and validates the methods with open-source software and numerical examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题