生物医学数据的近似KNN分类

论文标题

生物医学数据的近似KNN分类

Approximate kNN Classification for Biomedical Data

论文作者

Anagnostou, Panagiotis, Barmbas, Petros T., Vrahatis, Aristidis G., Tasoulis, Sotiris K.

论文摘要

在这个时代，大数据分析已经改变了解释各种生物医学现象的方式，并且随着生成的数据的增加，需要新的机器学习方法来处理这种进化的需求。一个指示性的例子是单细胞RNA-Seq（SCRNA-Seq），这是一种具有有希望的功能的新兴DNA测序技术，但由于大量缩放的生成数据，因此引起了重大的计算挑战。关于SCRNA-SEQ数据的分类过程，适当的方法是K最近的邻居（KNN）分类器，因为它通常用于大规模预测任务，因为其简单，最小的参数化和无模型性质。但是，特征Scrna-Seq的超高维度施加了计算瓶颈，而预测能力可能会受到“维度诅咒”的影响。在这项工作中，我们提出了在SCRNA-SEQ数据中使用近似最近的邻居搜索算法来实现KNN分类的任务，该算法的重点是针对高维数据量身定制的特定方法。我们认为，即使放松的近似解决方案也不会显着影响预测性能。实验结果通过提供更广泛的适用性来证实原始假设。

We are in the era where the Big Data analytics has changed the way of interpreting the various biomedical phenomena, and as the generated data increase, the need for new machine learning methods to handle this evolution grows. An indicative example is the single-cell RNA-seq (scRNA-seq), an emerging DNA sequencing technology with promising capabilities but significant computational challenges due to the large-scaled generated data. Regarding the classification process for scRNA-seq data, an appropriate method is the k Nearest Neighbor (kNN) classifier since it is usually utilized for large-scale prediction tasks due to its simplicity, minimal parameterization, and model-free nature. However, the ultra-high dimensionality that characterizes scRNA-seq impose a computational bottleneck, while prediction power can be affected by the "Curse of Dimensionality". In this work, we proposed the utilization of approximate nearest neighbor search algorithms for the task of kNN classification in scRNA-seq data focusing on a particular methodology tailored for high dimensional data. We argue that even relaxed approximate solutions will not affect the prediction performance significantly. The experimental results confirm the original assumption by offering the potential for broader applicability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题