论文标题

Segram:用于基因组序列到图形和序列到序列映射的通用硬件加速器

SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping

论文作者

Cali, Damla Senol, Kanellopoulos, Konstantinos, Lindegger, Joel, Bingöl, Zülal, Kalsi, Gurpreet S., Zuo, Ziyi, Firtina, Can, Cavlak, Meryem Banu, Kim, Jeremie, Ghiasi, Nika Mansouri, Singh, Gagandeep, Gómez-Luna, Juan, Alserr, Nour Almadhoun, Alser, Mohammed, Subramoney, Sreenivas, Alkan, Can, Ghose, Saugata, Mutlu, Onur

论文摘要

基因组序列分析的关键步骤是从个体到已知的线性参考基因组序列(即序列到序列映射)收集的测序DNA片段(即读取)的映射。最近的作品用基于图的参考基因组代替线性参考序列,该序列捕获了人群中许多个体的遗传变异和多样性。映射读取为基于图的参考基因组(即序列到图形映射)可在基因组分析中显着改善。不幸的是,虽然对序列到序列映射进行了对许多可用的工具和加速器的深入研究,但是序列到图形映射是一个更困难的计算问题,目前可用的实用软件工具数量少得多。 我们分析了两个最先进的序列到图形映射工具,并揭示了四个关键问题。我们发现,需要有一个专业的,高性能的,可扩展的和低成本的算法/硬件共同设计,可以减轻序列到图形映射的播种和对齐步骤中的瓶颈。 为此,我们提出了Segram,一种通用算法/硬件共同设计的基因组映射加速器,该加速器可以有效,有效地支持序列到图形映射和序列到序列映射,以供简短读取和长读取。据我们所知,Segram是第一个用于加速序列到图形映射的算法/硬件共同设计。 Segram由两个主要组成部分组成:(1)Minseed,第一个基于最小化的播种机; (2)BITALIGN,第一个基于BITACTOCTOR的序列到仪表对齐加速器。 我们证明SEGRA为序列到序列和序列序列映射管道的多个步骤提供了重大改进。

A critical step of genome sequence analysis is the mapping of sequenced DNA fragments (i.e., reads) collected from an individual to a known linear reference genome sequence (i.e., sequence-to-sequence mapping). Recent works replace the linear reference sequence with a graph-based representation of the reference genome, which captures the genetic variations and diversity across many individuals in a population. Mapping reads to the graph-based reference genome (i.e., sequence-to-graph mapping) results in notable quality improvements in genome analysis. Unfortunately, while sequence-to-sequence mapping is well studied with many available tools and accelerators, sequence-to-graph mapping is a more difficult computational problem, with a much smaller number of practical software tools currently available. We analyze two state-of-the-art sequence-to-graph mapping tools and reveal four key issues. We find that there is a pressing need to have a specialized, high-performance, scalable, and low-cost algorithm/hardware co-design that alleviates bottlenecks in both the seeding and alignment steps of sequence-to-graph mapping. To this end, we propose SeGraM, a universal algorithm/hardware co-designed genomic mapping accelerator that can effectively and efficiently support both sequence-to-graph mapping and sequence-to-sequence mapping, for both short and long reads. To our knowledge, SeGraM is the first algorithm/hardware co-design for accelerating sequence-to-graph mapping. SeGraM consists of two main components: (1) MinSeed, the first minimizer-based seeding accelerator; and (2) BitAlign, the first bitvector-based sequence-to-graph alignment accelerator. We demonstrate that SeGraM provides significant improvements for multiple steps of the sequence-to-graph and sequence-to-sequence mapping pipelines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源