论文标题
用于细菌染色体中HI-C数据多尺度分析的计算工具
Computational tools for the multiscale analysis of Hi-C data in bacterial chromosomes
论文作者
论文摘要
就像在真核生物中一样,高通量染色体构象捕获(HI-C)的数据已经揭示了细菌染色体的嵌套组织进入重叠的相互作用域。在本章中,我们提出了一个旨在捕获和量化这些属性的多尺度分析框架。其中包括标准工具(例如联系方式)和新颖的索引,允许识别与域形成相关的基因座,而不是独立于发挥作用的结构量表。我们的目标是两个方面。一方面,我们旨在提供基于Python/Jupyter的完整,易于理解的代码,该代码均可被计算机科学家以及没有高级计算背景的生物学家使用。另一方面,我们讨论了HI-C数据分析固有的统计问题,特别是如何正确评估结果的统计意义。作为一个教学的例子,我们分析了一种模型的致病细菌{\ it pseudomonas peudomonas pearuginosa}中产生的数据。所有文件(代码和输入数据)都可以在GitHub存储库上找到。我们还将文件嵌入了活页夹包中,以便可以通过Internet在任何机器上运行完整的分析。
Just as in eukaryotes, high-throughput chromosome conformation capture (Hi-C) data have revealed nested organizations of bacterial chromosomes into overlapping interaction domains. In this chapter, we present a multiscale analysis framework aiming at capturing and quantifying these properties. These include both standard tools (e.g. contact laws) and novel ones such as an index that allows identifying loci involved in domain formation independently of the structuring scale at play. Our objective is two-fold. On the one hand, we aim at providing a full, understandable Python/Jupyter-based code which can be used by both computer scientists as well as biologists with no advanced computational background. On the other hand, we discuss statistical issues inherent to Hi-C data analysis, focusing more particularly on how to properly assess the statistical significance of results. As a pedagogical example, we analyze data produced in {\it Pseudomonas aeruginosa}, a model pathogenetic bacterium. All files (codes and input data) can be found on a github repository. We have also embedded the files into a Binder package so that the full analysis can be run on any machine through internet.