论文标题
具有两个以上类的诊断设置的最佳分类和普遍的患病率估计
Optimal classification and generalized prevalence estimates for diagnostic settings with more than two classes
论文作者
论文摘要
准确的多类分类策略对于解释抗体测试至关重要。但是,基于置信区间或接收器操作特征的传统方法对具有两个以上类别的设置缺乏明显的扩展。我们通过基于概率建模和最佳决策理论开发多类分类来解决这个问题,从而最大程度地减少了错误的分类率的凸组合。当每个班级中人口的相对分数(或普遍的患病率)尚不清楚时,分类过程具有挑战性。因此,我们还开发了一种估计独立于分类的测试数据普遍流行率的方法。我们使用严重的急性呼吸综合征2(SARS-COV-2)幼稚,先前感染和接种疫苗的类别来验证我们对血清学数据的方法。合成数据用于证明(i)普遍性估计值是公正的,并收敛到真实值,(ii)我们的过程适用于任意测量维度。与二进制问题相反,多类设置将广泛的实用程序作为最通用的框架,并为患病率估计的最佳实践提供了新的见解。
An accurate multiclass classification strategy is crucial to interpreting antibody tests. However, traditional methods based on confidence intervals or receiver operating characteristics lack clear extensions to settings with more than two classes. We address this problem by developing a multiclass classification based on probabilistic modeling and optimal decision theory that minimizes the convex combination of false classification rates. The classification process is challenging when the relative fraction of the population in each class, or generalized prevalence, is unknown. Thus, we also develop a method for estimating the generalized prevalence of test data that is independent of classification. We validate our approach on serological data with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) naïve, previously infected, and vaccinated classes. Synthetic data are used to demonstrate that (i) prevalence estimates are unbiased and converge to true values and (ii) our procedure applies to arbitrary measurement dimensions. In contrast to the binary problem, the multiclass setting offers wide-reaching utility as the most general framework and provides new insight into prevalence estimation best practices.