论文标题

通过相关集群识别CFDNA WGS数据中的OCR

Identifying OCRs in cfDNA WGS Data by Correlation Clustering

论文作者

Noravesh, Farshad, Palizban, Fahimeh

论文摘要

在最近的十年中,液体活检的出现显着改善了癌症的监测和检测。垂死的细胞,包括起源于肿瘤的细胞,将其DNA脱落到血液中,并促成一批称为无细胞DNA(CFDNA)的循环片段。从其表观遗传特征中鉴定出这些DNA片段的组织起源在各种临床背景下具有含义。开放的染色质区域(OCR)是反映原点细胞类型的DNA的重要表观遗传特征。通过DNase-Seq,ATAC-Seq和组蛋白芯片seq对这些特征进行分析提供了对组织特异性和疾病特异性调节机制的见解。以前已经报道了通过液体活检的癌症检测的基因组和表观基因组特征的整合。但是,许多多模式分析需要大量的CFDNA输入和/或多种类型的实验,以涵盖单个样本的基因组和表观基因组方面,这是成本和时间过时的。因此,在单个实验类型中捕获基因组和表观基因组谱的方法很重要。从整个基因组测序(WGS)数据中预测OCR就是一种方法。在这里,我们应用了一种相关聚类算法来预测OCR。我们使用局部测序深度作为算法的输入。然后应用多个处理步骤如下:计数归一化,离散的傅立叶变换转换,图形构造,线性编程和聚类优化的图形剪切优化。为了验证所提出的方法,我们将预测的输出(OCR与非OR-OR)与先前验证的开放式染色质区域进行了比较,与ATAC-DB的人类血液样本有关。它们之间的重叠百分比大于67%。

In the recent decade, the emergence of liquid biopsy has significantly improved cancer monitoring and detection. Dying cells, including those originating from tumors, shed their DNA into the bloodstream and contribute to a pool of circulating fragments called cell-free DNA (cfDNA). Identifying the tissue origin of these DNA fragments from their epigenetic features has implications in various clinical contexts. Open chromatin regions (OCRs) are important epigenetic features of DNA that reflect cell types of origin. Profiling these features by DNase-seq, ATAC-seq, and histone ChIP-seq provides insights into tissue-specific and disease-specific regulatory mechanisms. Integration of genomic and epigenomic features for cancer detection by liquid biopsy has previously been reported. However, many multimodal analyses require large amounts of cfDNA input and/or multiple types of experiments to cover the genomic and epigenomic aspects of a single sample which is cost and time prohibitive. Thus, methods that capture genomic and epigenomic profiles in a single experiment type with low input requirements are of importance. Predicting OCRs from whole genome sequencing (WGS) data is one such approach. Here, we applied a correlation clustering algorithm to predict OCRs. We used local sequencing depth as input to our algorithm. Multiple processing steps were then applied as follows: count normalization, discrete Fourier transform conversion, graph construction, graph cut optimization by linear programming, and clustering. To validate the proposed method, we compared the output of our predictions (OCR vs. non-OCR) with previously validated open chromatin regions related to human blood samples of the ATAC-db. The percentage of overlap between them is greater than 67%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源