论文标题

在与改编的混合模型同时使用信号处理来分类基因组信号

Using Signal Processing in Tandem With Adapted Mixture Models for Classifying Genomic Signals

论文作者

Jaiswal, Saish, Nema, Shreya, Murthy, Hema A, Narayanan, Manikandan

论文摘要

基因组信号处理已在生物信息学中成功使用,以分析生物分子序列,并获得对DNA结构,基因组织,蛋白质结合,序列进化等的不同见解,但是在寻找适当的生物分子序列的光谱表示时,挑战仍然存在,尤其是当需要一致地处理多个可变长度序列时。在这项研究中,我们在研究基因组序列分为不同的分类单元(应变,门,秩序等)的良好问题的背景下解决了这一挑战。我们提出了一种新型技术,该技术在与高斯混合模型同时采用信号处理,以改善序列的光谱表示,并随后分类学分类精度。这些序列首先转化为光谱,并投影到一个子空间,在该子空间中,属于不同分类单元的序列可以更好地区分。我们的方法在建立的基准数据集上的绝对余量为6.06%的精度,优于建立基准数据集的类似最新方法。

Genomic signal processing has been used successfully in bioinformatics to analyze biomolecular sequences and gain varied insights into DNA structure, gene organization, protein binding, sequence evolution, etc. But challenges remain in finding the appropriate spectral representation of a biomolecular sequence, especially when multiple variable-length sequences need to be handled consistently. In this study, we address this challenge in the context of the well-studied problem of classifying genomic sequences into different taxonomic units (strain, phyla, order, etc.). We propose a novel technique that employs signal processing in tandem with Gaussian mixture models to improve the spectral representation of a sequence and subsequently the taxonomic classification accuracies. The sequences are first transformed into spectra, and projected to a subspace, where sequences belonging to different taxons are better distinguishable. Our method outperforms a similar state-of-the-art method on established benchmark datasets by an absolute margin of 6.06% accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源