论文标题

系统动力学中的自适应优先采样

Adaptive preferential sampling in phylodynamics

论文作者

Cappello, Lorenzo, Palacios, Julia A.

论文摘要

迅速发展的病毒和病原体的纵向分子数据提供了有关疾病扩散的信息,并根据病例计数数据补充了传统监测方法。结合用于建模代表样本祖先关系的家谱。基本的假设是,结合事件的发生率与有效的人口大小$ n_ {e}(t)$成反比,这是对遗传多样性的时间变化的量度。当采样过程(随着时间的时间收集样品的收集)取决于$ n_ {e}(t)$,可以共同建模合并和采样过程以提高$ n_ {e}(t)$的估计。由于模型错误指定,因此未能这样做可能导致偏差。但是,抽样过程取决于有效人口规模的方式可能会随着时间而变化。我们介绍了一种方法,其中采样过程被建模为一种不均匀的泊松过程,其速率等于$ n_ {e}(t)$的乘积和一个随时间变化的系数,从而通过马尔可夫随机场验证了其功能形状的最小假设。我们提供了用于推断的可扩展算法,在模拟研究中显示模型性能相对于替代方法,并将我们的模型应用于洛杉矶和圣克拉拉县的SARS-COV-2序列。该方法已实现并在R软件包ADAPREF中可用。

Longitudinal molecular data of rapidly evolving viruses and pathogens provide information about disease spread and complement traditional surveillance approaches based on case count data. The coalescent is used to model the genealogy that represents the sample ancestral relationships. The basic assumption is that coalescent events occur at a rate inversely proportional to the effective population size $N_{e}(t)$, a time-varying measure of genetic diversity. When the sampling process (collection of samples over time) depends on $N_{e}(t)$, the coalescent and the sampling processes can be jointly modeled to improve estimation of $N_{e}(t)$. Failing to do so can lead to bias due to model misspecification. However, the way that the sampling process depends on the effective population size may vary over time. We introduce an approach where the sampling process is modeled as an inhomogeneous Poisson process with rate equal to the product of $N_{e}(t)$ and a time-varying coefficient, making minimal assumptions on their functional shapes via Markov random field priors. We provide scalable algorithms for inference, show the model performance vis-a-vis alternative methods in a simulation study, and apply our model to SARS-CoV-2 sequences from Los Angeles and Santa Clara counties. The methodology is implemented and available in the R package adapref.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源