论文标题
从分子到基因组变异到科学发现:智能基因组分析的智能算法和体系结构
Going From Molecules to Genomic Variations to Scientific Discovery: Intelligent Algorithms and Architectures for Intelligent Genome Analysis
论文作者
论文摘要
现在,我们比以往任何时候都需要更多的基因组分析。我们需要不仅迅速阅读,分析和解释我们的基因组,而且还需要准确,有效地将分析扩展到人群水平。目前在整个基因组分析管道中存在主要的计算瓶颈和效率低下,因为最先进的基因组测序技术仍然无法全部读取基因组。我们描述了使用智能算法和硬件体系结构显着提高基因组分析的性能,准确性和效率的持续旅程。我们为基因组分析管道的每个步骤的最新算法方法和基于硬件的加速方法解释了最新的算法方法,并提供了实验评估。算法方法利用基因组的结构以及基础硬件的结构。基于硬件的加速方法利用专门的微体系结构或各种执行范例(例如,内存内或附近的处理)以及算法更改,从而导致新的硬件/软件共同设计的系统。最后,我们预示着未来的挑战,收益和研究方向,这是由于发展非常低的成本又高度错误的新测序技术和用于基因组学专业的硬件芯片而触发的。我们希望这些努力和讨论的挑战为使基因组分析更加聪明的未来工作提供了基础。我们的实验评估中使用的分析脚本和数据可在以下网址获得:https://github.com/cmu-safari/molecules2variations
We now need more than ever to make genome analysis more intelligent. We need to read, analyze, and interpret our genomes not only quickly, but also accurately and efficiently enough to scale the analysis to population level. There currently exist major computational bottlenecks and inefficiencies throughout the entire genome analysis pipeline, because state-of-the-art genome sequencing technologies are still not able to read a genome in its entirety. We describe the ongoing journey in significantly improving the performance, accuracy, and efficiency of genome analysis using intelligent algorithms and hardware architectures. We explain state-of-the-art algorithmic methods and hardware-based acceleration approaches for each step of the genome analysis pipeline and provide experimental evaluations. Algorithmic approaches exploit the structure of the genome as well as the structure of the underlying hardware. Hardware-based acceleration approaches exploit specialized microarchitectures or various execution paradigms (e.g., processing inside or near memory) along with algorithmic changes, leading to new hardware/software co-designed systems. We conclude with a foreshadowing of future challenges, benefits, and research directions triggered by the development of both very low cost yet highly error prone new sequencing technologies and specialized hardware chips for genomics. We hope that these efforts and the challenges we discuss provide a foundation for future work in making genome analysis more intelligent. The analysis script and data used in our experimental evaluation are available at: https://github.com/CMU-SAFARI/Molecules2Variations