论文标题

Anyseq/gpu:一种新颖的方法,用于GPU上更快的序列对齐

AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs

论文作者

Müller, André, Schmidt, Bertil, Membarth, Richard, Leißa, Roland, Hack, Sebastian

论文摘要

近年来,下一代测序(NGS)技术产生的读数迅速增加,促使对生物信息学中序列比对的有效实现的需求。但是,当前的最新方法无法利用现代GPU的巨大并行处理能力,并具有近高峰性能。 我们提出AnySeq/gpu-A序列对齐库,该库通过使用经式避免和半精度的算术来最大程度地减少内存访问,从而通过新颖的方法来增强AnySeq1库来加速动态编程(DP)对齐。我们的实施基于AnyDSL编译器框架,该框架可以通过保证的部分评估来方便零成本的抽象。我们表明,我们的方法在NVIDIA和AMD GPU上达到了80%以上的峰值性能,从而超过了基于GPU的对齐库AnySeq1,Gasal2,Adept和NVBIO的表现,至少在3.6的范围内实现了至少3.6的因素,而在相同的相同方面和顺序相同的情况下,超过了19.2x的中间速度。 这导致NVIDIA GV100上最多可达1.7 TCUP(TERA单元更新),在单个NVIDIA A100上具有半精度算术,最高为3.3个TCUP,在AMD MI100上最高为3.8 TCUP。

In recent years, the rapidly increasing number of reads produced by next-generation sequencing (NGS) technologies has driven the demand for efficient implementations of sequence alignments in bioinformatics. However, current state-of-the-art approaches are not able to leverage the massively parallel processing capabilities of modern GPUs with close-to-peak performance. We present AnySeq/GPU-a sequence alignment library that augments the AnySeq1 library with a novel approach for accelerating dynamic programming (DP) alignment on GPUs by minimizing memory accesses using warp shuffles and half-precision arithmetic. Our implementation is based on the AnyDSL compiler framework which allows for convenient zero-cost abstractions through guaranteed partial evaluation. We show that our approach achieves over 80% of the peak performance on both NVIDIA and AMD GPUs thereby outperforming the GPU-based alignment libraries AnySeq1, GASAL2, ADEPT, and NVBIO by a factor of at least 3.6 while achieving a median speedup of 19.2x over these tools across different alignment scenarios and sequence lengths when running on the same hardware. This leads to throughputs of up to 1.7 TCUPS (tera cell updates per second) on an NVIDIA GV100, up to 3.3 TCUPS with half-precision arithmetic on a single NVIDIA A100, and up to 3.8 TCUPS on an AMD MI100.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源