论文标题
测序生物信息学算法及以后的理论分析
The theoretical analysis of sequencing bioinformatics algorithms and beyond
论文作者
论文摘要
性能的理论分析一直是许多应用领域算法工程的重要工具。它的目标是预测算法的经验性能,并成为驱动在实践中表现良好的新型算法设计的码数。尽管在许多情况下实现了这些目标,但在关键的应用领域中并未实现它们。我在测序生物信息学领域提供了一个案例研究,这是一个跨学科领域,该领域使用算法从基因组测序数据中提取生物学含义。特别是,我给出了三个具体的例子:两个显示了理论分析未能实现其目标的方式,另一个显示了它是如何成功的。然后,我将分类一些将理论分析应用于生物信息学测序的挑战,争论为什么经验分析不够,并给出提高理论分析与测序生物信息学的相关性的愿景。通过识别问题,了解其根源并提供潜在的解决方案,这项工作希望成为使理论分析在测序生物信息学和潜在的其他快节奏的应用程序领域中更相关的至关重要的第一步。
The theoretical analysis of performance has been an important tool in the engineering of algorithms in many application domains. Its goals are to predict the empirical performance of an algorithm and to be a yardstick that drives the design of novel algorithms that perform well in practice. While these goals have been achieved in many instances, they have not been achieved ubiquitously across crucial application domains. I provide a case study in the area of sequencing bioinformatics, an inter-disciplinary field that uses algorithms to extract biological meaning from genome sequencing data. In particular, I give three concrete examples: two showing how theoretical analysis has failed to achieve its goals and one showing how it has been successful. I will then catalog some of the challenges of applying theoretical analysis to sequencing bioinformatics, argue why empirical analysis is not enough, and give a vision for improving the relevance of theoretical analysis to sequencing bioinformatics. By recognizing the problem, understanding its roots, and providing potential solutions, this work can hopefully be a crucial first step towards making theoretical analysis more relevant in sequencing bioinformatics and potentially other fast-paced application domains.