论文标题
切除的mRNA的贝叶斯重建和差分测试
Bayesian Reconstruction and Differential Testing of Excised mRNA
论文作者
论文摘要
表征mRNA的差异切除对于理解细胞或组织的功能复杂性至关重要,从正常发育过程到疾病发病机理。大多数转录本重建方法从高通量测序数据推断出全长转录本。但是,由于不完整的注释以及跨细胞类型,组织和实验条件的转录本的差异表达,这是一项具有挑战性的任务。最近的几种方法通过考虑局部剪接事件来规避这些困难,但是这些方法丢失了成绩单级的剪接信息,并可能将转录本混为一谈。我们开发了第一个概率模型,该模型可以调解成绩单和局部剪接观点。首先,我们正式化了mRNA切除(SME)重建问题的序列,该问题的目的是从RNA-sequer-sequering数据中组装MRNA切除的可变长度序列。然后,我们提出了一种新型的分层贝叶斯混合模型,用于重建切除的mRNA(BREM)。 BREM在局部剪接事件和全长转录本之间进行插值,因此仅关注具有高后概率的中小型企业。我们基于Gibbs采样和对独立集的局部搜索开发后推理算法,并使用基于融合的BREM模型参数的广义线性模型来表征差异SME使用。我们表明,与在模拟数据上使用四种最新的成绩单和局部剪接方法相比,BREM在重建任务中获得了更高的F1得分,并提高了差异剪接的精度和灵敏度。最后,我们根据转录本的重建,产生的转录本的新颖性,对超参数的模型敏感性以及对差异表达的中小型企业的功能分析,评估BREM的大量和SCRNA测序数据,表明BREM捕获了相关的生物学信号。
Characterizing the differential excision of mRNA is critical for understanding the functional complexity of a cell or tissue, from normal developmental processes to disease pathogenesis. Most transcript reconstruction methods infer full-length transcripts from high-throughput sequencing data. However, this is a challenging task due to incomplete annotations and the differential expression of transcripts across cell-types, tissues, and experimental conditions. Several recent methods circumvent these difficulties by considering local splicing events, but these methods lose transcript-level splicing information and may conflate transcripts. We develop the first probabilistic model that reconciles the transcript and local splicing perspectives. First, we formalize the sequence of mRNA excisions (SME) reconstruction problem, which aims to assemble variable-length sequences of mRNA excisions from RNA-sequencing data. We then present a novel hierarchical Bayesian admixture model for the Reconstruction of Excised mRNA (BREM). BREM interpolates between local splicing events and full-length transcripts and thus focuses only on SMEs that have high posterior probability. We develop posterior inference algorithms based on Gibbs sampling and local search of independent sets and characterize differential SME usage using generalized linear models based on converged BREM model parameters. We show that BREM achieves higher F1 score for reconstruction tasks and improved accuracy and sensitivity in differential splicing when compared with four state-of-the-art transcript and local splicing methods on simulated data. Lastly, we evaluate BREM on both bulk and scRNA sequencing data based on transcript reconstruction, novelty of transcripts produced, model sensitivity to hyperparameters, and a functional analysis of differentially expressed SMEs, demonstrating that BREM captures relevant biological signal.