论文标题
在缺少数据的情况下估计病毒遗传连锁率
Estimating Viral Genetic Linkage Rates in the Presence of Missing Data
论文作者
论文摘要
尽管对社会和信息网络使用的兴趣已经增长,但网络上的大多数推论都假定收集的数据代表了完整。但是,当忽略丢失的数据时,即使完全随机丢失,这也会导致有关推理网络相关参数的估计值的偏差。在本文中,我们专注于构造估计量,因为一个随机选择的节点具有节点至少具有一个边缘的概率,假设节点与它们的相应边缘完全随机丢失了。此外,在获得此类估计量的渐近性能方面也出现了问题,因为跨节点的连锁指标与中央限制定理的直接应用和大量定律相关。使用子采样方法,我们为感兴趣的参数提供了改进的估计器,可容纳丢失的数据。利用u统计理论,我们得出了拟议估计量的一致性和渐近态性。这种方法减少了估计我们感兴趣的参数的偏见。我们使用来自大型聚类的HIV预防干预措施(博茨瓦纳联合预防项目(BCPP))的HIV病毒菌株(BCPP)来说明我们的方法。
Although the interest in the the use of social and information networks has grown, most inferences on networks assume the data collected represents the complete. However, when ignoring missing data, even when missing completely at random, this results in bias for estimators regarding inference network related parameters. In this paper, we focus on constructing estimators for the probability that a randomly selected node has node has at least one edge under the assumption that nodes are missing completely at random along with their corresponding edges. In addition, issues also arise in obtaining asymptotic properties for such estimators, because linkage indicators across nodes are correlated preventing the direct application of the Central Limit Theorem and Law of Large Numbers. Using a subsampling approach, we present an improved estimator for our parameter of interest that accommodates for missing data. Utilizing the theory U-statistics, we derive consistency and asymptotic normality of the proposed estimator. This approach decreases the bias in estimating our parameter of interest. We illustrate our approach using the HIV viral strains from a large cluster-randomized trial of a combination HIV prevention intervention -- the Botswana Combination Prevention Project (BCPP).