论文标题

任意较大进化距离的成对序列对齐

Pairwise sequence alignment at arbitrarily large evolutionary distance

论文作者

Legried, Brandon, Roch, Sebastien

论文摘要

祖先序列重建是计算生物学的关键任务。它包括在树尖端推断出已知系统发育的祖先物种的分子序列。除了其许多生物应用外,它还在阐明系统发育估计方法的统计性能方面发挥了关键作用。在这里,我们建立了与另一个重要的生物信息学问题,多个序列一致性的正式联系,其中人们试图通过插入差距在某些不匹配惩罚评分下最好地对齐分子序列的集合。我们的结果是违反直觉的:我们表明,在任意较大的进化距离原理上可能具有高概率的完美成对序列对准 - 前提是系统发育是已知且足够致密的。我们在富含分类群的设置中使用祖先序列重建的技术以及涉及插入和缺失的序列演化模型的概率分析。

Ancestral sequence reconstruction is a key task in computational biology. It consists in inferring a molecular sequence at an ancestral species of a known phylogeny, given descendant sequences at the tip of the tree. In addition to its many biological applications, it has played a key role in elucidating the statistical performance of phylogeny estimation methods. Here we establish a formal connection to another important bioinformatics problem, multiple sequence alignment, where one attempts to best align a collection of molecular sequences under some mismatch penalty score by inserting gaps. Our result is counter-intuitive: we show that perfect pairwise sequence alignment with high probability is possible in principle at arbitrary large evolutionary distances - provided the phylogeny is known and dense enough. We use techniques from ancestral sequence reconstruction in the taxon-rich setting together with the probabilistic analysis of sequence evolution models involving insertions and deletions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源