论文标题
估计血清学检验的COVID-19患病率:部分鉴定方法
Estimation of Covid-19 Prevalence from Serology Tests: A Partial Identification Approach
论文作者
论文摘要
我们提出了一种部分鉴定方法,用于估计血清学研究中的疾病患病率。我们的数据是某些人群样本中抗体测试的结果,其中测试参数(例如真/假阳性速率)尚不清楚。我们的方法扫描了整个参数空间,并使用关节数据密度作为测试统计量拒绝参数值。所提出的方法通常对于边际推断是保守的,但它比更标准的方法的关键优势在于,即使没有确定基础模型,它在有限样本中也是有效的。此外,我们的方法仅需要血清学测试结果的独立性,并且不依赖渐近论点,正态性假设或其他近似值。我们在美国使用最近的Covid-19血清学研究,并表明参数置信度通常是广泛的,并且不能支持明确的结论。具体而言,来自加利福尼亚州的最新血清学研究表明,在0%-2%的范围内(在研究时)的任何地方都有流行,因此尚无定论。但是,如果抗体测试的实际假阳性速率确实接近其经验估计值(〜0.5%),则该范围可以缩小到0.7%-1.5%。在纽约州的另一项研究中,COVID-19的患病率在2020年中期的13%-17%的范围内得到自信,这也表明在整个美国的Covid-19暴露中的地理差异很大。结合所有数据集的患病率范围为5%-8%。我们的总体结果表明,大规模的血清学测试也可以为将来的策略设计提供至关重要的信息,即使这些测试不完善,并且其参数未知。
We propose a partial identification method for estimating disease prevalence from serology studies. Our data are results from antibody tests in some population sample, where the test parameters, such as the true/false positive rates, are unknown. Our method scans the entire parameter space, and rejects parameter values using the joint data density as the test statistic. The proposed method is conservative for marginal inference, in general, but its key advantage over more standard approaches is that it is valid in finite samples even when the underlying model is not point identified. Moreover, our method requires only independence of serology test results, and does not rely on asymptotic arguments, normality assumptions, or other approximations. We use recent Covid-19 serology studies in the US, and show that the parameter confidence set is generally wide, and cannot support definite conclusions. Specifically, recent serology studies from California suggest a prevalence anywhere in the range 0%-2% (at the time of study), and are therefore inconclusive. However, this range could be narrowed down to 0.7%-1.5% if the actual false positive rate of the antibody test was indeed near its empirical estimate (~0.5%). In another study from New York state, Covid-19 prevalence is confidently estimated in the range 13%-17% in mid-April of 2020, which also suggests significant geographic variation in Covid-19 exposure across the US. Combining all datasets yields a 5%-8% prevalence range. Our results overall suggest that serology testing on a massive scale can give crucial information for future policy design, even when such tests are imperfect and their parameters unknown.