论文标题
两个数据集的故事:网络样本推断的代表性和普遍性
A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks
论文作者
论文摘要
在过去的二十年中,统计网络分析的基础方面取得了很大进展,但是从理论到应用的路径并不是一件直接的。使用两种不同但互补的抽样设计收集了比利时内部内部接触的小型网络的两个大型,异质的样本:一个较小,但每个家庭中的所有接触都较小,另一个较大的,更较大,更具代表性的接触,但每个家庭只有一个人的接触。我们希望结合他们的优势,以学习塑造家庭接触形成的社会力量,并促进模拟疾病传播的预测,同时将其推广到该地区的家庭人口。 为了实现这一目标,我们描述了一个灵活的框架,用于指定指数家庭类中的多网络模型,并确定在此框架下的推论和预测要求,即使数据不完整,也可以是一致,可识别和可概括的;探索在实践中如何违反这些要求;并开发一套定量和图形诊断,以检测违规行为并建议改进候选模型。我们报告网络规模,地理和家庭角色对家庭接触模式的影响(活动,活动中的异质性和三合会关闭)。
The last two decades have seen considerable progress in foundational aspects of statistical network analysis, but the path from theory to application is not straightforward. Two large, heterogeneous samples of small networks of within-household contacts in Belgium were collected using two different but complementary sampling designs: one smaller but with all contacts in each household observed, the other larger and more representative but recording contacts of only one person per household. We wish to combine their strengths to learn the social forces that shape household contact formation and facilitate simulation for prediction of disease spread, while generalising to the population of households in the region. To accomplish this, we describe a flexible framework for specifying multi-network models in the exponential family class and identify the requirements for inference and prediction under this framework to be consistent, identifiable, and generalisable, even when data are incomplete; explore how these requirements may be violated in practice; and develop a suite of quantitative and graphical diagnostics for detecting violations and suggesting improvements to candidate models. We report on the effects of network size, geography, and household roles on household contact patterns (activity, heterogeneity in activity, and triadic closure).